From travis at continuum.io Tue May 1 00:14:18 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 30 Apr 2012 23:14:18 -0500 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> Message-ID: > > The same is true of SciPy. I think if SciPy also migrates to use Github issues, then together with IPython we can really be a voice that helps Github. I will propose to NumFOCUS that the Foundation sponsor migration of the Trac to Github for NumPy and SciPy. If anyone would like to be involved in this migration project, please let me know. > > > There is a group where I work that purchased the enterprise version of github. But they still use trac. I think Ralf's opinion should count for a fair amount here, since the tracker is important for releases and backports. Having a good connection between commits and tickets is also very helpful, although sticking with github might be better there. The issue tracker isn't really intended as social media and I find the notifications from trac sufficient. > > Chuck I think Ralf and your opinion on this is *huge*. It seems that Issue tracking is at the heart of "social media" activity, though, because you need people to submit issues and you need people to respond to those issues in a timely way. And it is ideal if the dialogue that might ensue pursuant to that activity is discoverable via search and linking. But the issue tracking problem really is dividable into separate work flows: 1) The submission of the issue (here things like ease-of-entry and attaching files is critical) 2) The dialogue around the issue (developer comments on it and any discussion that ensues) 3) Developer management of issues Now, it is also true that these three things don't have to all intersect. It is very possible to have different systems manage different parts. What I find less than optimal these days is having github as the web-site for pull requests and discussions around them and a poorly-performing trac for issue tracking and milestone management and a few wiki pages. Can we at least agree to have all the wiki pages and web-site managed by github? For issue tracking, I'm very anxious for your and Ralf's opinion because of the effort you have spent using Trac over the years. Another developer I asked at LLNL, just said "why don't you use bugzilla"? -Travis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue May 1 01:53:18 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 1 May 2012 07:53:18 +0200 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: <4F9F1CB6.5050804@uci.edu> References: <4F9F1CB6.5050804@uci.edu> Message-ID: On Tue, May 1, 2012 at 1:13 AM, Christoph Gohlke wrote: > > > On 4/30/2012 1:16 PM, Ralf Gommers wrote: > > Hi all, > > > > Charles has done a great job of backporting a lot of bug fixes to 1.6.2, > > see PRs 260, 261, 262 and 263. For those who are interested, please have > > a look at those PRs to see and comment on what's proposed to go into > 1.6.2. > > > > I also have a request for help with testing: can someone who uses MSVC > > test (preferably with a 2.x and a 3.x version)? I have a branch with all > > four PRs merged at https://github.com/rgommers/numpy/tree/bports > > > > Thanks, > > Ralf > > > > > Hi Ralf, > > that branch builds and tests OK with msvc9/MKL on win-amd64-py2.7 and > win-amd64-py3.2. No apparent incompatibilities with scipy or matplotlib > either. > Great, thanks Christoph! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 1 02:16:06 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 00:16:06 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> Message-ID: On Mon, Apr 30, 2012 at 10:14 PM, Travis Oliphant wrote: > > >> The same is true of SciPy. I think if SciPy also migrates to use >> Github issues, then together with IPython we can really be a voice that >> helps Github. I will propose to NumFOCUS that the Foundation sponsor >> migration of the Trac to Github for NumPy and SciPy. If anyone would >> like to be involved in this migration project, please let me know. >> >> > There is a group where I work that purchased the enterprise version of > github. But they still use trac. I think Ralf's opinion should count for a > fair amount here, since the tracker is important for releases and > backports. Having a good connection between commits and tickets is also > very helpful, although sticking with github might be better there. The > issue tracker isn't really intended as social media and I find the > notifications from trac sufficient. > > Chuck > > > > I think Ralf and your opinion on this is *huge*. It seems that Issue > tracking is at the heart of "social media" activity, though, because you > need people to submit issues and you need people to respond to those issues > in a timely way. And it is ideal if the dialogue that might ensue > pursuant to that activity is discoverable via search and linking. > > But the issue tracking problem really is dividable into separate work > flows: > 1) The submission of the issue (here things like ease-of-entry and > attaching files is critical) > Yes, trac does allow attachments, but numpy trac locks up pretty often and I don't see why I should be locked out from submitting a comment just because someone else has made a comment while I've been typing. That said, trac does seem more responsive lately (thanks Pauli). I also don't find the trac interface attractive to look at, but it works. I could live without attachments, but they can be handy for scripts that demonstrate bugs and so on. As for fixes, I think we want to encourage pull requests rather than attached patches, but it might take a while before everyone is comfortable with that procedure. > 2) The dialogue around the issue (developer comments on it and any > discussion that ensues) > I find the trac email notifications pretty good in that regard, although it would be nice to have everything in one place. The main issue I have, actually, is that github won't send me everything. I want to see every posted comment and every commit show up in the mail, including my own comments. The RSS feeds seem better for those notifications, but I have two feeds from github and they don't show the same things. Maybe we need to put together a tracking workflow document to supplement the git workflow document. I'd also like to see bug fix commits go in and automatically notify the tracker with an @#666 or some such. I don't know if that sort of thing is possible with the github tracker or the github hooks. > 3) Developer management of issues > > This is where Ralf comes in as I don't regard myself as that familiar with trac. The problem in choosing a new system is that one needs to actually use it for some serious work to properly evaluate it. But we don't have the time to do that, so we ask. And then everyone has their own favorite, perhaps because it is the only one they have used. What impressed me was that Ralf seems to have actually used several different systems. Now, it is also true that these three things don't have to all intersect. > It is very possible to have different systems manage different parts. > What I find less than optimal these days is having github as the web-site > for pull requests and discussions around them and a poorly-performing trac > for issue tracking and milestone management and a few wiki pages. > > Can we at least agree to have all the wiki pages and web-site managed by > github? For issue tracking, I'm very anxious for your and Ralf's > opinion because of the effort you have spent using Trac over the years. > It makes sense to move the wiki and web site. I hardly ever look at the developer Wiki anyway and might be more tempted to do so if it were up on github. It would also be good if those things could be managed in one place. As is, the permissions for managing things are split up and it isn't always clear who has the magic key. > > Another developer I asked at LLNL, just said "why don't you use bugzilla"? > > > The simplest solution would be to use github, the question is whether the convenience and possibility of future improvements outweigh the better functionality of a dedicated system. In making that decision I think Ralf's opinion should carry the most weight since my experience in using trac for releases and such is much more limited than his. It might also be good to check on the ability of the different systems to export the data in some sensible format in case we change our minds. I suppose worker time and money are also part of the equation. If it looks like the github solution is cheaper and easier to manage, that in itself could be a significant factor in the choice. And that also argues for making a pick pretty soon, there is no point in having folks spending a few more weeks just trying things out. Of course, the main problem with trac, which a better system won't solve, is having the time to deal with the issues. Fedora uses bugzilla, and after a ticket has aged sufficiently they simply close it on principal even if it hasn't been officially closed. No doubt a lot of things do get fixed without anyone connecting it to a specific ticket. Maybe we need to have a collection of failing tests, not just tests for things that have been fixed. If I could hit a button and send the problem code into a failure repository, formatted for nose and with a number and back link, that would be great. Maybe not up there with a flying car but certainly a close second. Something like that could even be part of a continuous integration system. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 1 02:24:17 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 00:24:17 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> Message-ID: On Tue, May 1, 2012 at 12:16 AM, Charles R Harris wrote: > > > On Mon, Apr 30, 2012 at 10:14 PM, Travis Oliphant wrote: > >> >> >>> The same is true of SciPy. I think if SciPy also migrates to use >>> Github issues, then together with IPython we can really be a voice that >>> helps Github. I will propose to NumFOCUS that the Foundation sponsor >>> migration of the Trac to Github for NumPy and SciPy. If anyone would >>> like to be involved in this migration project, please let me know. >>> >>> >> There is a group where I work that purchased the enterprise version of >> github. But they still use trac. I think Ralf's opinion should count for a >> fair amount here, since the tracker is important for releases and >> backports. Having a good connection between commits and tickets is also >> very helpful, although sticking with github might be better there. The >> issue tracker isn't really intended as social media and I find the >> notifications from trac sufficient. >> >> Chuck >> >> >> >> I think Ralf and your opinion on this is *huge*. It seems that Issue >> tracking is at the heart of "social media" activity, though, because you >> need people to submit issues and you need people to respond to those issues >> in a timely way. And it is ideal if the dialogue that might ensue >> pursuant to that activity is discoverable via search and linking. >> >> But the issue tracking problem really is dividable into separate work >> flows: >> 1) The submission of the issue (here things like ease-of-entry and >> attaching files is critical) >> > > Yes, trac does allow attachments, but numpy trac locks up pretty often and > I don't see why I should be locked out from submitting a comment just > because someone else has made a comment while I've been typing. That said, > trac does seem more responsive lately (thanks Pauli). > > I also don't find the trac interface attractive to look at, but it works. > I could live without attachments, but they can be handy for scripts that > demonstrate bugs and so on. As for fixes, I think we want to encourage pull > requests rather than attached patches, but it might take a while before > everyone is comfortable with that procedure. > > >> 2) The dialogue around the issue (developer comments on it and any >> discussion that ensues) >> > > I find the trac email notifications pretty good in that regard, although > it would be nice to have everything in one place. The main issue I have, > actually, is that github won't send me everything. I want to see every > posted comment and every commit show up in the mail, including my own > comments. The RSS feeds seem better for those notifications, but I have two > feeds from github and they don't show the same things. Maybe we need to put > together a tracking workflow document to supplement the git workflow > document. > > I'd also like to see bug fix commits go in and automatically notify the > tracker with an @#666 or some such. I don't know if that sort of thing is > possible with the github tracker or the github hooks. > > >> 3) Developer management of issues >> >> > This is where Ralf comes in as I don't regard myself as that familiar with > trac. The problem in choosing a new system is that one needs to actually > use it for some serious work to properly evaluate it. But we don't have the > time to do that, so we ask. And then everyone has their own favorite, > perhaps because it is the only one they have used. What impressed me was > that Ralf seems to have actually used several different systems. > > Now, it is also true that these three things don't have to all intersect. >> It is very possible to have different systems manage different parts. >> What I find less than optimal these days is having github as the web-site >> for pull requests and discussions around them and a poorly-performing trac >> for issue tracking and milestone management and a few wiki pages. >> >> Can we at least agree to have all the wiki pages and web-site managed by >> github? For issue tracking, I'm very anxious for your and Ralf's >> opinion because of the effort you have spent using Trac over the years. >> > > It makes sense to move the wiki and web site. I hardly ever look at the > developer Wiki anyway and might be more tempted to do so if it were up on > github. It would also be good if those things could be managed in one > place. As is, the permissions for managing things are split up and it isn't > always clear who has the magic key. > > >> >> Another developer I asked at LLNL, just said "why don't you use >> bugzilla"? >> >> > The simplest solution would be to use github, the question is whether the > convenience and possibility of future improvements outweigh the better > functionality of a dedicated system. In making that decision I think Ralf's > opinion should carry the most weight since my experience in using trac for > releases and such is much more limited than his. It might also be good to > check on the ability of the different systems to export the data in some > sensible format in case we change our minds. I suppose worker time and > money are also part of the equation. If it looks like the github solution > is cheaper and easier to manage, that in itself could be a significant > factor in the choice. And that also argues for making a pick pretty soon, > there is no point in having folks spending a few more weeks just trying > things out. > > Of course, the main problem with trac, which a better system won't solve, > is having the time to deal with the issues. Fedora uses bugzilla, and after > a ticket has aged sufficiently they simply close it on principal even if it > hasn't been officially closed. No doubt a lot of things do get fixed > without anyone connecting it to a specific ticket. Maybe we need to have a > collection of failing tests, not just tests for things that have been > fixed. If I could hit a button and send the problem code into a failure > repository, formatted for nose and with a number and back link, that would > be great. Maybe not up there with a flying car but certainly a close > second. Something like that could even be part of a continuous integration > system. > > I'll add that tickets don't get looked at as often as they should and that is an important consideration. Having them up on github might help in that regard. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue May 1 02:52:42 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 1 May 2012 01:52:42 -0500 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <4F9F552F.4060605@creativetrax.com> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> Message-ID: <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> On Apr 30, 2012, at 10:14 PM, Jason Grout wrote: > On 4/30/12 6:31 PM, Travis Oliphant wrote: >> Hey all, >> >> We have been doing some investigation of various approaches to issue tracking. The last time the conversation left this list was with Ralf's current list of preferences as: >> >> 1) Redmine >> 2) Trac >> 3) Github >> >> Since that time, Maggie who has been doing a lot of work settting up various issue tracking tools over the past couple of months, has set up a redmine instance and played with it. This is a possibility as a future issue tracker. >> >> However, today I took a hard look at what the IPython folks are doing with their issue tracker and was very impressed by the level of community integration that having issues tracked by Github provides. Right now, we have a major community problem in that there are 3 conversations taking place (well at least 2 1/2). One on Github, one on this list, and one on the Trac and it's accompanying wiki. >> >> I would like to propose just using Github's issue tracker. This just seems like the best move overall for us at this point. I like how the Pull Request mechanism integrates with the issue tracking. We could setup a Redmine instance but this would just re-create the same separation of communities that currently exists with the pull-requests, the mailing list, and the Trac pages. Redmine is nicer than Trac, but it's still a separate space. We need to make Github the NumPy developer hub and not have it spread throughout several sites. >> >> The same is true of SciPy. I think if SciPy also migrates to use Github issues, then together with IPython we can really be a voice that helps Github. I will propose to NumFOCUS that the Foundation sponsor migration of the Trac to Github for NumPy and SciPy. If anyone would like to be involved in this migration project, please let me know. >> >> Comments, concerns? > > I've been pretty impressed with the lemonade that the IPython folks have > made out of what I see as pretty limiting shortcomings of the github > issue tracker. I've been trying to use it for a much smaller project > (https://github.com/sagemath/sagecell/), and it is a lot harder, in my > (somewhat limited) experience, than using trac or the google issue > tracker. None of these issues seems like it would be too hard to solve, > but since we don't even have the source to the tracker, we're somewhat > at github's mercy for any improvements. Github does have a very nice > API for interacting with the data, which somewhat makes up for some of > the severe shortcomings of the web interface. > > In no particular order, here are a few that come to mind immediately: > > 1. No key:value pairs for labels (Fernando brought this up a long time > ago, I think). This is brilliant in Google code's tracker, and allows > for custom fields that help in tracking workflow (like status, priority, > etc.). Sure, you can do what the IPython folks are doing and just > create labels for every possible status, but that's unwieldy and takes a > lot of discipline to maintain. Which means it takes a lot of developer > time or it becomes inconsistent and not very useful. I'm not sure how much of an issue this is. A lot of tools use single tags for categorization and it works pretty well. A simple "key:value" label communicates about the same information together with good query tools. > > 2. The disjointed relationship between pull requests and issues. They > share numberings, for example, and both support discussions, etc. If > you use the API, you can submit code to an issue, but then the issue > becomes a pull request, which means that all labels on the issue > disappear from the web interface (but you can still manage to set labels > using the list view of the issue tracker, if I recall correctly). If > you don't attach code to issues, it means that every issue is duplicated > in a pull request, which splits the conversation up between an issue > ticket and a pull request ticket. Hmm.. So pull requests *are* issues. This sounds like it might actually be a feature and also means that we *are* using the Github issue tracker (just only those issues that have a pull-request attached). Losing labels seems like a real problem (are they really lost or do they just not appear in the pull-request view?) > > 3. No attachments for issues (screenshots, supporting documents, etc.). > Having API access to data won't help you here. Using gists and references to gists can overcome this. Also using an attachment service like http://uploading.com/ or dropbox makes this problem less of an issue really. > > 4. No custom queries. We love these in the Sage trac instance; since we > have full access to the database, we can run any sort of query we want. > With API data access, you can build your own queries, so maybe this > isn't insurmountable. yes, you can build your own queries. This seems like an area where github can improve (and tools can be written which improve the experience). > > 5. Stylistically, the webpage is not very dense on information. I get > frustrated when trying to see the issues because they only come 25 at a > time, and never grouped into any sort of groupings, and there are only 3 > options for sorting issues. Compare the very nice, dense layout of > Google Code issues or bitbucket. Google Code issues also lets you > cross-tabulate the issues so you can quickly triage them. Compare also > the pretty comprehensive options for sorting and grouping things in trac. Yes, it looks like you can group via labels, milestones, and "your" issues. This is also something that can be over-come with tools that use the github API. It would be good to hear from users of the IPython github issue tracker to see how they like it "in the wild". How problematic are these issues in practice. Does it reduce or increase the participation in issue tracking both by users and by developers. Thanks, -Travis > > 6. Side-by-side diffs are nice to have, and I believe bitbucket and > google code both have them. Of course, this isn't a deal-breaker > because you can always pull the branch down, but it would be nice to > have, and there's not really a way we can put it into the github tracker > ourselves. > > How does, for example, the JIRA github connector work? Does it pull in > code comments, etc.? > > Anyways, I'm not a regular contributor to numpy, but I have been trying > to get used to the github tracker for about a year now, and I just keep > getting more frustrated at it. I suppose the biggest frustrating part > about it is that it is closed source, so even if I did want to scratch > an itch, I can't. > > That said, it is nice to have code and dev conversations happening in > one place. There are great things about github issues, of course. But > I'm not so sure, for me, that they outweigh some of the administrative > issues listed above. > > Thanks, > > Jason > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 1 03:12:44 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 01:12:44 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 12:52 AM, Travis Oliphant wrote: > > On Apr 30, 2012, at 10:14 PM, Jason Grout wrote: > > On 4/30/12 6:31 PM, Travis Oliphant wrote: > > Hey all, > > > We have been doing some investigation of various approaches to issue > tracking. The last time the conversation left this list was with > Ralf's current list of preferences as: > > > 1) Redmine > > 2) Trac > > 3) Github > > > Since that time, Maggie who has been doing a lot of work settting up > various issue tracking tools over the past couple of months, has set up a > redmine instance and played with it. This is a possibility as a future > issue tracker. > > > However, today I took a hard look at what the IPython folks are doing with > their issue tracker and was very impressed by the level of community > integration that having issues tracked by Github provides. Right now, we > have a major community problem in that there are 3 conversations taking > place (well at least 2 1/2). One on Github, one on this list, and one on > the Trac and it's accompanying wiki. > > > I would like to propose just using Github's issue tracker. This just > seems like the best move overall for us at this point. I like how the > Pull Request mechanism integrates with the issue tracking. We could > setup a Redmine instance but this would just re-create the same separation > of communities that currently exists with the pull-requests, the mailing > list, and the Trac pages. Redmine is nicer than Trac, but it's still a > separate space. We need to make Github the NumPy developer hub and not > have it spread throughout several sites. > > > The same is true of SciPy. I think if SciPy also migrates to use Github > issues, then together with IPython we can really be a voice that helps > Github. I will propose to NumFOCUS that the Foundation sponsor migration > of the Trac to Github for NumPy and SciPy. If anyone would like to be > involved in this migration project, please let me know. > > > Comments, concerns? > > > I've been pretty impressed with the lemonade that the IPython folks have > made out of what I see as pretty limiting shortcomings of the github > issue tracker. I've been trying to use it for a much smaller project > (https://github.com/sagemath/sagecell/), and it is a lot harder, in my > (somewhat limited) experience, than using trac or the google issue > tracker. None of these issues seems like it would be too hard to solve, > but since we don't even have the source to the tracker, we're somewhat > at github's mercy for any improvements. Github does have a very nice > API for interacting with the data, which somewhat makes up for some of > the severe shortcomings of the web interface. > > In no particular order, here are a few that come to mind immediately: > > 1. No key:value pairs for labels (Fernando brought this up a long time > ago, I think). This is brilliant in Google code's tracker, and allows > for custom fields that help in tracking workflow (like status, priority, > etc.). Sure, you can do what the IPython folks are doing and just > create labels for every possible status, but that's unwieldy and takes a > lot of discipline to maintain. Which means it takes a lot of developer > time or it becomes inconsistent and not very useful. > > > I'm not sure how much of an issue this is. A lot of tools use single tags > for categorization and it works pretty well. A simple "key:value" label > communicates about the same information together with good query tools. > > > 2. The disjointed relationship between pull requests and issues. They > share numberings, for example, and both support discussions, etc. If > you use the API, you can submit code to an issue, but then the issue > becomes a pull request, which means that all labels on the issue > disappear from the web interface (but you can still manage to set labels > using the list view of the issue tracker, if I recall correctly). If > you don't attach code to issues, it means that every issue is duplicated > in a pull request, which splits the conversation up between an issue > ticket and a pull request ticket. > > > Hmm.. So pull requests *are* issues. This sounds like it might > actually be a feature and also means that we *are* using the Github issue > tracker (just only those issues that have a pull-request attached). > Losing labels seems like a real problem (are they really lost or do they > just not appear in the pull-request view?) > > > 3. No attachments for issues (screenshots, supporting documents, etc.). > Having API access to data won't help you here. > > > Using gists and references to gists can overcome this. Also using an > attachment service like http://uploading.com/ or dropbox makes this > problem less of an issue really. > > > 4. No custom queries. We love these in the Sage trac instance; since we > have full access to the database, we can run any sort of query we want. > With API data access, you can build your own queries, so maybe this > isn't insurmountable. > > > yes, you can build your own queries. This seems like an area where > github can improve (and tools can be written which improve the experience). > > > > 5. Stylistically, the webpage is not very dense on information. I get > frustrated when trying to see the issues because they only come 25 at a > time, and never grouped into any sort of groupings, and there are only 3 > options for sorting issues. Compare the very nice, dense layout of > Google Code issues or bitbucket. Google Code issues also lets you > cross-tabulate the issues so you can quickly triage them. Compare also > the pretty comprehensive options for sorting and grouping things in trac. > > > Yes, it looks like you can group via labels, milestones, and "your" > issues. This is also something that can be over-come with tools that use > the github API. > > > It would be good to hear from users of the IPython github issue tracker to > see how they like it "in the wild". How problematic are these issues in > practice. Does it reduce or increase the participation in issue tracking > both by users and by developers. > > Thanks, > > -Travis > > > > > > 6. Side-by-side diffs are nice to have, and I believe bitbucket and > google code both have them. Of course, this isn't a deal-breaker > because you can always pull the branch down, but it would be nice to > have, and there's not really a way we can put it into the github tracker > ourselves. > > How does, for example, the JIRA github connector work? Does it pull in > code comments, etc.? > > Anyways, I'm not a regular contributor to numpy, but I have been trying > to get used to the github tracker for about a year now, and I just keep > getting more frustrated at it. I suppose the biggest frustrating part > about it is that it is closed source, so even if I did want to scratch > an itch, I can't. > > That said, it is nice to have code and dev conversations happening in > one place. There are great things about github issues, of course. But > I'm not so sure, for me, that they outweigh some of the administrative > issues listed above. > > I'm thinking we could do worse than simply take Ralf's top pick. Github definitely sounds a bit clunky for issue tracking, and while we could put together workarounds, I think Jason's point about the overall frustration is telling. And while we could, maybe, put together tools to work with it, I think what we want is something that works out of the box. Implementing workarounds for a frustrating system doesn't seem the best use of developer time. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Tue May 1 03:19:08 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Tue, 01 May 2012 02:19:08 -0500 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: <4F9F8E6C.3000303@creativetrax.com> On 5/1/12 1:52 AM, Travis Oliphant wrote: >> 1. No key:value pairs for labels (Fernando brought this up a long time >> ago, I think). This is brilliant in Google code's tracker, and allows >> for custom fields that help in tracking workflow (like status, priority, >> etc.). Sure, you can do what the IPython folks are doing and just >> create labels for every possible status, but that's unwieldy and takes a >> lot of discipline to maintain. Which means it takes a lot of developer >> time or it becomes inconsistent and not very useful. > > I'm not sure how much of an issue this is. A lot of tools use single > tags for categorization and it works pretty well. A simple "key:value" > label communicates about the same information together with good query > tools. Sure, it is possible, but it takes the hierarchal information out of the picture, so we lose semantic meaning. It is possible to add two different conflicting priorities, for example, and you can't enforce a certain workflow like you can with trac states, for example. Not to mention that the only filtering you can do on labels is by clicking the labels on the left, which is an "AND" filter. How do you search for tickets that are labeled priority-critical OR priority-high? > >> >> 2. The disjointed relationship between pull requests and issues. They >> share numberings, for example, and both support discussions, etc. If >> you use the API, you can submit code to an issue, but then the issue >> becomes a pull request, which means that all labels on the issue >> disappear from the web interface (but you can still manage to set labels >> using the list view of the issue tracker, if I recall correctly). If >> you don't attach code to issues, it means that every issue is duplicated >> in a pull request, which splits the conversation up between an issue >> ticket and a pull request ticket. > > Hmm.. So pull requests *are* issues. This sounds like it might actually > be a feature and also means that we *are* using the Github issue tracker > (just only those issues that have a pull-request attached). Losing > labels seems like a real problem (are they really lost or do they just > not appear in the pull-request view?) I just double-checked. This issue started out as an issue, then I "attached" code by using the API to attach a branch to the issue [1]. The result is: https://github.com/sagemath/sagecell/pull/300 You'll notice that there are no tags on the right side, and as a project admin, I don't see any way to add them either. If you go to the list view: https://github.com/sagemath/sagecell/issues?milestone=6&sort=updated&state=closed you actually *do* see two tags attached. I *can* change labels in the list view, if it's the list view for *issues*. If I view the same "issue turned pull request" in the pull request list view: https://github.com/sagemath/sagecell/pulls (you have to click "closed"---apparently clicking "closed" doesn't change the URL, so I can't give you a link to that listing view...) then I see at the top (right now) "pull request 300" (though in the pull request view, you don't see the 300 to the side, like you see in the issue view). Of course, like other pull requests, you can't attach labels in the pull request list view. From the above, I get the idea that github does not really support attaching code to issues, though it is technically possible through the API. To me, that means that every problem has at least two tickets (an issue and a pull request), and you have to make sure to manually close and sync one with the other, and the discussion is split up between the two tickets. [1] One of my students has a short script to encapsulate doing this via the API: https://gist.github.com/2156799 > >> >> 3. No attachments for issues (screenshots, supporting documents, etc.). >> Having API access to data won't help you here. > > Using gists and references to gists can overcome this. Also using an > attachment service like http://uploading.com/ or dropbox makes this > problem less of an issue really. Sure, it's possible, but like you said, it splits up the conversation to have parts of it hosted elsewhere. > >> >> 4. No custom queries. We love these in the Sage trac instance; since we >> have full access to the database, we can run any sort of query we want. >> With API data access, you can build your own queries, so maybe this >> isn't insurmountable. > > yes, you can build your own queries. This seems like an area where > github can improve (and tools can be written which improve the experience). Yep, so the question is: how much of a bug tracker and common reports are you willing to rebuild on the github infrastructure, or is it easier to use something with all of this built in that also has git/github connections. > >> >> 5. Stylistically, the webpage is not very dense on information. I get >> frustrated when trying to see the issues because they only come 25 at a >> time, and never grouped into any sort of groupings, and there are only 3 >> options for sorting issues. Compare the very nice, dense layout of >> Google Code issues or bitbucket. Google Code issues also lets you >> cross-tabulate the issues so you can quickly triage them. Compare also >> the pretty comprehensive options for sorting and grouping things in trac. > > Yes, it looks like you can group via labels, milestones, and "your" > issues. This is also something that can be over-come with tools that use > the github API. You can filter on those, but I don't see how to group on those (i.e., have a big listing of everything, but group the results together based on their labels or their milestones, etc. It makes it much harder to get a big-picture view of lots of issues and how they are related. Again, of course you can write tools to do this with the github API, but again, how much existing bug-tracker functionality do you want to reimplement? By the way: are people keeping a backup of the issues, etc.? You can get a data dump of, for example, the full issue database. It seems like it would make sense for a project, like IPython, to periodically download all of its data to make an off-site backup not tied to a commercial company. > > > It would be good to hear from users of the IPython github issue tracker > to see how they like it "in the wild". How problematic are these issues > in practice. Does it reduce or increase the participation in issue > tracking both by users and by developers. I agree; it would be good to hear from someone with wider and broader experience with the github tracker on these issues. Thanks, Jason From david.froger at gmail.com Tue May 1 05:14:29 2012 From: david.froger at gmail.com (David Froger) Date: Tue, 01 May 2012 11:14:29 +0200 Subject: [Numpy-discussion] Continuous Integration In-Reply-To: References: Message-ID: <1335862622-sup-5857@david-desktop> Excerpts from Travis Oliphant's message of mar. mai 01 01:39:26 +0200 2012: > If you have particular reasons why we should choose a particular CI service, please speak up and let your voice be heard. There is still time to make a difference in what we are setting up. Hi all, What about buildbot? (http://trac.buildbot.net/) I'm using it currently, and like it because is GPL 2, configuration files are powerful Python scripts, and development team is active and dynamic. Best, David From pav at iki.fi Tue May 1 09:38:53 2012 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 01 May 2012 15:38:53 +0200 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: 01.05.2012 08:52, Travis Oliphant kirjoitti: [clip] >> 3. No attachments for issues (screenshots, supporting documents, etc.). >> Having API access to data won't help you here. > > Using gists and references to gists can overcome this. Also using an > attachment service like http://uploading.com/ or dropbox makes this > problem less of an issue really. I'm not so sure this works for binary data. At least for Scipy, one sometimes needs to submit also data files. The needs here are probably somewhat different from IPython which can probably live with small text snippets, for which gists do work. The problem with using an external services I see is guaranteeing that the file still is there when you go looking for it (three years after the fact --- shouldn't happen in principle, but it does :). Dropbox I think is not good for this --- people will delete their stuff. I'm not sure about other attachment services. Pauli From matrixhasu at gmail.com Tue May 1 11:27:00 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Tue, 1 May 2012 17:27:00 +0200 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: References: Message-ID: Hello, with my Debian hat one I'd surely like to give it a go - do you have any plan to release a tarball for a RC (given the implication of numpy on the distro, I can't test anything else)? what do you expect to be the release date for 1.6.2? I asked this to understand the impact, due to the upcoming Debian freeze (June, still not clear if beginning or end). Cheers, Sandri On Mon, Apr 30, 2012 at 22:16, Ralf Gommers wrote: > Hi all, > > Charles has done a great job of backporting a lot of bug fixes to 1.6.2, see > PRs 260, 261, 262 and 263. For those who are interested, please have a look > at those PRs to see and comment on what's proposed to go into 1.6.2. > > I also have a request for help with testing: can someone who uses MSVC test > (preferably with a 2.x and a 3.x version)? I have a branch with all four PRs > merged at https://github.com/rgommers/numpy/tree/bports > > Thanks, > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From keith.hughitt at gmail.com Tue May 1 12:47:51 2012 From: keith.hughitt at gmail.com (Keith Hughitt) Date: Tue, 1 May 2012 12:47:51 -0400 Subject: [Numpy-discussion] Test failures - which dependencies am I missing? In-Reply-To: References: Message-ID: Hi Chris, Try "sudo apt-get build-dep python-numpy" to install the dependencies for building NumPy. I believe it will install all of the optional dependencies as well. HTH, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue May 1 12:56:25 2012 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 01 May 2012 18:56:25 +0200 Subject: [Numpy-discussion] Continuous Integration In-Reply-To: <1335862622-sup-5857@david-desktop> References: <1335862622-sup-5857@david-desktop> Message-ID: 01.05.2012 11:14, David Froger kirjoitti: > Excerpts from Travis Oliphant's message of mar. mai 01 01:39:26 +0200 2012: > > If you have particular reasons why we should choose a particular CI service, > > please speak up and let your voice be heard. There is still time to make > > a difference in what we are setting up. > > Hi all, > > What about buildbot? (http://trac.buildbot.net/) > > I'm using it currently, and like it because is GPL 2, configuration files are > powerful Python scripts, and development team is active and dynamic. We're currently using it: http://buildbot.scipy.org Although, it hasn't been building automatically for some time now. It has the following shortcomings: - Apparently, for one master instance, only one project. - Last time I looked, Git integration was poor. Maybe this has improved since... Pauli From matthew.brett at gmail.com Tue May 1 13:17:00 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 May 2012 10:17:00 -0700 Subject: [Numpy-discussion] Continuous Integration In-Reply-To: References: <1335862622-sup-5857@david-desktop> Message-ID: Hi, On Tue, May 1, 2012 at 9:56 AM, Pauli Virtanen wrote: > 01.05.2012 11:14, David Froger kirjoitti: >> Excerpts from Travis Oliphant's message of mar. mai 01 01:39:26 +0200 2012: >> > If you have particular reasons why we should choose a particular CI service, >> > please speak up and let your voice be heard. ?There is still time to > make >> > a difference in what we are setting up. >> >> Hi all, >> >> What about ?buildbot? (http://trac.buildbot.net/) >> >> I'm using it currently, ?and like it ?because is GPL 2, ?configuration files are >> powerful Python scripts, and development team is active and dynamic. > > We're currently using it: http://buildbot.scipy.org > Although, it hasn't been building automatically for some time now. > > It has the following shortcomings: > > - Apparently, for one master instance, only one project. We're building 4 github projects with one buildbot process, and one master.cfg script, but maybe that's not what you meant? http://nipy.bic.berkeley.edu/builders > - Last time I looked, Git integration was poor. Maybe this has > ?improved since... Our projects use git pollers polling github repos. We did run into random occasional builds stalling at the git update step, but I believe that has been fixed with the latest release. Best, Matthew From charlesr.harris at gmail.com Tue May 1 13:22:34 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 11:22:34 -0600 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: References: Message-ID: On Mon, Apr 30, 2012 at 2:16 PM, Ralf Gommers wrote: > Hi all, > > Charles has done a great job of backporting a lot of bug fixes to 1.6.2, > see PRs 260, 261, 262 and 263. For those who are interested, please have a > look at those PRs to see and comment on what's proposed to go into 1.6.2. > > I also have a request for help with testing: can someone who uses MSVC > test (preferably with a 2.x and a 3.x version)? I have a branch with all > four PRs merged at https://github.com/rgommers/numpy/tree/bports > > Works for me on OSX 10.6, no errors. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 1 13:37:48 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 11:37:48 -0600 Subject: [Numpy-discussion] Python3, genfromtxt and unicode In-Reply-To: References: Message-ID: On Fri, Apr 27, 2012 at 8:17 PM, Antony Lee wrote: > With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to > the largest number of chars (npyio.py line 1596), but it doesn't do the > same for unicode fields, which is a pity. See example below. > I tried to change npyio.py around line 1600 to add that but it didn't > work; from my limited understanding the problem comes earlier, in the way > StringBuilder is defined(?). > Antony Lee > > import io, numpy as np > s = io.BytesIO() > s.write(b"abc 1\ndef 2") > s.seek(0) > t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes}) > print(t, t.dtype) # -> [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1', ' s.seek(0) > t = np.genfromtxt(s, dtype=None, converters={0: lambda s: > s.decode("utf-8")}) > print(t, t.dtype) # -> [('', 1) ('', 2)] [('f0', ' > Could you open a ticket for this? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue May 1 14:24:58 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 1 May 2012 20:24:58 +0200 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: References: Message-ID: On Tue, May 1, 2012 at 5:27 PM, Sandro Tosi wrote: > Hello, > with my Debian hat one I'd surely like to give it a go - do you have > any plan to release a tarball for a RC (given the implication of numpy > on the distro, I can't test anything else)? what do you expect to be > the release date for 1.6.2? I asked this to understand the impact, due > to the upcoming Debian freeze (June, still not clear if beginning or > end). > We certainly have the Debian freeze date in the back of our heads. I'm aiming for an RC sometime this week and a release two weeks after that. Cheers, Ralf > On Mon, Apr 30, 2012 at 22:16, Ralf Gommers > wrote: > > Hi all, > > > > Charles has done a great job of backporting a lot of bug fixes to 1.6.2, > see > > PRs 260, 261, 262 and 263. For those who are interested, please have a > look > > at those PRs to see and comment on what's proposed to go into 1.6.2. > > > > I also have a request for help with testing: can someone who uses MSVC > test > > (preferably with a 2.x and a 3.x version)? I have a branch with all four > PRs > > merged at https://github.com/rgommers/numpy/tree/bports > > > > Thanks, > > Ralf > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > Sandro Tosi (aka morph, morpheus, matrixhasu) > My website: http://matrixhasu.altervista.org/ > Me at Debian: http://wiki.debian.org/SandroTosi > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrixhasu at gmail.com Tue May 1 14:29:06 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Tue, 1 May 2012 20:29:06 +0200 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: References: Message-ID: On Tue, May 1, 2012 at 20:24, Ralf Gommers wrote: > > > On Tue, May 1, 2012 at 5:27 PM, Sandro Tosi wrote: >> >> Hello, >> with my Debian hat one I'd surely like to give it a go - do you have >> any plan to release a tarball for a RC (given the implication of numpy >> on the distro, I can't test anything else)? what do you expect to be >> the release date for 1.6.2? I asked this to understand the impact, due >> to the upcoming Debian freeze (June, still not clear if beginning or >> end). > > > We certainly have the Debian freeze date in the back of our heads. I'm I can only say "Thank you!" :) > aiming for an RC sometime this week and a release two weeks after that. Awesome, I'll be waiting for the RC to prepare the upload, directly to our main archive. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From antony.lee at berkeley.edu Tue May 1 15:24:37 2012 From: antony.lee at berkeley.edu (Antony Lee) Date: Tue, 1 May 2012 12:24:37 -0700 Subject: [Numpy-discussion] Python3, genfromtxt and unicode In-Reply-To: References: Message-ID: Sure, I will. Right now my solution is to use genfromtxt once with bytes and auto-dtype detection, then modify the resulting dtype, replacing bytes with unicodes, and use that new dtypes for a second round of genfromtxt. A bit awkward but that gets the job done. Antony Lee 2012/5/1 Charles R Harris > > > On Fri, Apr 27, 2012 at 8:17 PM, Antony Lee wrote: > >> With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to >> the largest number of chars (npyio.py line 1596), but it doesn't do the >> same for unicode fields, which is a pity. See example below. >> I tried to change npyio.py around line 1600 to add that but it didn't >> work; from my limited understanding the problem comes earlier, in the way >> StringBuilder is defined(?). >> Antony Lee >> >> import io, numpy as np >> s = io.BytesIO() >> s.write(b"abc 1\ndef 2") >> s.seek(0) >> t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes}) >> print(t, t.dtype) # -> [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1', >> '> s.seek(0) >> t = np.genfromtxt(s, dtype=None, converters={0: lambda s: >> s.decode("utf-8")}) >> print(t, t.dtype) # -> [('', 1) ('', 2)] [('f0', '> >> > Could you open a ticket for this? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snickell at gmail.com Tue May 1 15:24:58 2012 From: snickell at gmail.com (Seth Nickell) Date: Tue, 1 May 2012 12:24:58 -0700 Subject: [Numpy-discussion] Is NumpyDotNet (aka numpy-refactor) likely to be merged into the mainline? Message-ID: With a little work, I think numpy/scipy could be very useful to those of us who have to program on .NET for one reason or another, but 64-bit is currently not supported (at least, not as released). I'm considering working out 64-bit support, but it appears to me like the numpy-refactor repository isn't on a path to merging with the mainline, and is likely to bit-rot (if it hasn't already). Is anyone working on this, or is NumpyDotNet 'resting' at the moment, so to speak? Its sort of pointless to work on a dead-end branch ;-) thanks! -seth From ralf.gommers at googlemail.com Tue May 1 15:34:49 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 1 May 2012 21:34:49 +0200 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 9:12 AM, Charles R Harris wrote: > > > On Tue, May 1, 2012 at 12:52 AM, Travis Oliphant wrote: > >> >> On Apr 30, 2012, at 10:14 PM, Jason Grout wrote: >> >> On 4/30/12 6:31 PM, Travis Oliphant wrote: >> >> Hey all, >> >> >> We have been doing some investigation of various approaches to issue >> tracking. The last time the conversation left this list was with >> Ralf's current list of preferences as: >> >> >> 1) Redmine >> >> 2) Trac >> >> 3) Github >> >> >> Since that time, Maggie who has been doing a lot of work settting up >> various issue tracking tools over the past couple of months, has set up a >> redmine instance and played with it. This is a possibility as a future >> issue tracker. >> >> >> However, today I took a hard look at what the IPython folks are doing >> with their issue tracker and was very impressed by the level of community >> integration that having issues tracked by Github provides. Right now, we >> have a major community problem in that there are 3 conversations taking >> place (well at least 2 1/2). One on Github, one on this list, and one on >> the Trac and it's accompanying wiki. >> >> >> I would like to propose just using Github's issue tracker. This just >> seems like the best move overall for us at this point. I like how the >> Pull Request mechanism integrates with the issue tracking. We could >> setup a Redmine instance but this would just re-create the same separation >> of communities that currently exists with the pull-requests, the mailing >> list, and the Trac pages. Redmine is nicer than Trac, but it's still a >> separate space. We need to make Github the NumPy developer hub and not >> have it spread throughout several sites. >> >> >> The same is true of SciPy. I think if SciPy also migrates to use >> Github issues, then together with IPython we can really be a voice that >> helps Github. I will propose to NumFOCUS that the Foundation sponsor >> migration of the Trac to Github for NumPy and SciPy. If anyone would >> like to be involved in this migration project, please let me know. >> >> >> Comments, concerns? >> >> >> I've been pretty impressed with the lemonade that the IPython folks have >> made out of what I see as pretty limiting shortcomings of the github >> issue tracker. I've been trying to use it for a much smaller project >> (https://github.com/sagemath/sagecell/), and it is a lot harder, in my >> (somewhat limited) experience, than using trac or the google issue >> tracker. None of these issues seems like it would be too hard to solve, >> but since we don't even have the source to the tracker, we're somewhat >> at github's mercy for any improvements. Github does have a very nice >> API for interacting with the data, which somewhat makes up for some of >> the severe shortcomings of the web interface. >> >> In no particular order, here are a few that come to mind immediately: >> >> 1. No key:value pairs for labels (Fernando brought this up a long time >> ago, I think). This is brilliant in Google code's tracker, and allows >> for custom fields that help in tracking workflow (like status, priority, >> etc.). Sure, you can do what the IPython folks are doing and just >> create labels for every possible status, but that's unwieldy and takes a >> lot of discipline to maintain. Which means it takes a lot of developer >> time or it becomes inconsistent and not very useful. >> >> >> I'm not sure how much of an issue this is. A lot of tools use single >> tags for categorization and it works pretty well. A simple "key:value" >> label communicates about the same information together with good query >> tools. >> >> >> 2. The disjointed relationship between pull requests and issues. They >> share numberings, for example, and both support discussions, etc. If >> you use the API, you can submit code to an issue, but then the issue >> becomes a pull request, which means that all labels on the issue >> disappear from the web interface (but you can still manage to set labels >> using the list view of the issue tracker, if I recall correctly). If >> you don't attach code to issues, it means that every issue is duplicated >> in a pull request, which splits the conversation up between an issue >> ticket and a pull request ticket. >> >> >> Hmm.. So pull requests *are* issues. This sounds like it might >> actually be a feature and also means that we *are* using the Github issue >> tracker (just only those issues that have a pull-request attached). >> Losing labels seems like a real problem (are they really lost or do they >> just not appear in the pull-request view?) >> >> >> 3. No attachments for issues (screenshots, supporting documents, etc.). >> Having API access to data won't help you here. >> >> >> Using gists and references to gists can overcome this. Also using an >> attachment service like http://uploading.com/ or dropbox makes this >> problem less of an issue really. >> >> >> 4. No custom queries. We love these in the Sage trac instance; since we >> have full access to the database, we can run any sort of query we want. >> With API data access, you can build your own queries, so maybe this >> isn't insurmountable. >> >> >> yes, you can build your own queries. This seems like an area where >> github can improve (and tools can be written which improve the experience). >> >> >> >> 5. Stylistically, the webpage is not very dense on information. I get >> frustrated when trying to see the issues because they only come 25 at a >> time, and never grouped into any sort of groupings, and there are only 3 >> options for sorting issues. Compare the very nice, dense layout of >> Google Code issues or bitbucket. Google Code issues also lets you >> cross-tabulate the issues so you can quickly triage them. Compare also >> the pretty comprehensive options for sorting and grouping things in trac. >> >> >> Yes, it looks like you can group via labels, milestones, and "your" >> issues. This is also something that can be over-come with tools that use >> the github API. >> >> >> It would be good to hear from users of the IPython github issue tracker >> to see how they like it "in the wild". How problematic are these issues >> in practice. Does it reduce or increase the participation in issue >> tracking both by users and by developers. >> >> Thanks, >> >> -Travis >> >> >> >> >> >> 6. Side-by-side diffs are nice to have, and I believe bitbucket and >> google code both have them. Of course, this isn't a deal-breaker >> because you can always pull the branch down, but it would be nice to >> have, and there's not really a way we can put it into the github tracker >> ourselves. >> >> How does, for example, the JIRA github connector work? Does it pull in >> code comments, etc.? >> >> Anyways, I'm not a regular contributor to numpy, but I have been trying >> to get used to the github tracker for about a year now, and I just keep >> getting more frustrated at it. I suppose the biggest frustrating part >> about it is that it is closed source, so even if I did want to scratch >> an itch, I can't. >> >> That said, it is nice to have code and dev conversations happening in >> one place. There are great things about github issues, of course. But >> I'm not so sure, for me, that they outweigh some of the administrative >> issues listed above. >> >> > I'm thinking we could do worse than simply take Ralf's top pick. Github > definitely sounds a bit clunky for issue tracking, and while we could put > together workarounds, I think Jason's point about the overall frustration > is telling. And while we could, maybe, put together tools to work with it, > I think what we want is something that works out of the box. Implementing > workarounds for a frustrating system doesn't seem the best use of developer > time. > Having looked at the IPython issues and Jason's example, it's still my impression that Github is inferior to Trac/Redmine as a bug tracker -- but not as much as I first thought. The IPython team has managed to make it work quite well (assuming you can stand the multi-colored patchwork of labels...). At this point it's probably good to look again at the problems we want to solve: 1. responsive user interface (must absolutely have) 2. mass editing of tickets (good to have) 3. usable API (good to have) 4. various ideas/issues mentioned at http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow Note that Github does solve 1, 2 and 3 (as does Redmine). It does come with some new problems that require workarounds, but we can probably live with them. I'm not convinced that being on Github will actually get more eyes on the tickets, but there certainly won't be less. The main problem with Github (besides the issues/PRs thing and no attachments, which I can live with) is that to make it work we'll have to religiously label everything. And because users aren't allowed to attach labels, it will require a larger time investment from maintainers. Are we okay with that? If everyone else is and we can distribute this task, it's fine with me. David has been investigating bug trackers long before me, and Pauli has done most of the work administering Trac as far as I know, so I'd like to at least hear their preferences too before we make a decision. Then I hope we can move this along quickly, because any choice will be a huge improvement over the current situation. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 1 16:17:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 14:17:11 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 1:34 PM, Ralf Gommers wrote: > > > On Tue, May 1, 2012 at 9:12 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Tue, May 1, 2012 at 12:52 AM, Travis Oliphant wrote: >> >>> >>> On Apr 30, 2012, at 10:14 PM, Jason Grout wrote: >>> >>> On 4/30/12 6:31 PM, Travis Oliphant wrote: >>> >>> Hey all, >>> >>> >>> We have been doing some investigation of various approaches to issue >>> tracking. The last time the conversation left this list was with >>> Ralf's current list of preferences as: >>> >>> >>> 1) Redmine >>> >>> 2) Trac >>> >>> 3) Github >>> >>> >>> Since that time, Maggie who has been doing a lot of work settting up >>> various issue tracking tools over the past couple of months, has set up a >>> redmine instance and played with it. This is a possibility as a future >>> issue tracker. >>> >>> >>> However, today I took a hard look at what the IPython folks are doing >>> with their issue tracker and was very impressed by the level of community >>> integration that having issues tracked by Github provides. Right now, we >>> have a major community problem in that there are 3 conversations taking >>> place (well at least 2 1/2). One on Github, one on this list, and one on >>> the Trac and it's accompanying wiki. >>> >>> >>> I would like to propose just using Github's issue tracker. This just >>> seems like the best move overall for us at this point. I like how the >>> Pull Request mechanism integrates with the issue tracking. We could >>> setup a Redmine instance but this would just re-create the same separation >>> of communities that currently exists with the pull-requests, the mailing >>> list, and the Trac pages. Redmine is nicer than Trac, but it's still a >>> separate space. We need to make Github the NumPy developer hub and not >>> have it spread throughout several sites. >>> >>> >>> The same is true of SciPy. I think if SciPy also migrates to use >>> Github issues, then together with IPython we can really be a voice that >>> helps Github. I will propose to NumFOCUS that the Foundation sponsor >>> migration of the Trac to Github for NumPy and SciPy. If anyone would >>> like to be involved in this migration project, please let me know. >>> >>> >>> Comments, concerns? >>> >>> >>> I've been pretty impressed with the lemonade that the IPython folks have >>> made out of what I see as pretty limiting shortcomings of the github >>> issue tracker. I've been trying to use it for a much smaller project >>> (https://github.com/sagemath/sagecell/), and it is a lot harder, in my >>> (somewhat limited) experience, than using trac or the google issue >>> tracker. None of these issues seems like it would be too hard to solve, >>> but since we don't even have the source to the tracker, we're somewhat >>> at github's mercy for any improvements. Github does have a very nice >>> API for interacting with the data, which somewhat makes up for some of >>> the severe shortcomings of the web interface. >>> >>> In no particular order, here are a few that come to mind immediately: >>> >>> 1. No key:value pairs for labels (Fernando brought this up a long time >>> ago, I think). This is brilliant in Google code's tracker, and allows >>> for custom fields that help in tracking workflow (like status, priority, >>> etc.). Sure, you can do what the IPython folks are doing and just >>> create labels for every possible status, but that's unwieldy and takes a >>> lot of discipline to maintain. Which means it takes a lot of developer >>> time or it becomes inconsistent and not very useful. >>> >>> >>> I'm not sure how much of an issue this is. A lot of tools use single >>> tags for categorization and it works pretty well. A simple "key:value" >>> label communicates about the same information together with good query >>> tools. >>> >>> >>> 2. The disjointed relationship between pull requests and issues. They >>> share numberings, for example, and both support discussions, etc. If >>> you use the API, you can submit code to an issue, but then the issue >>> becomes a pull request, which means that all labels on the issue >>> disappear from the web interface (but you can still manage to set labels >>> using the list view of the issue tracker, if I recall correctly). If >>> you don't attach code to issues, it means that every issue is duplicated >>> in a pull request, which splits the conversation up between an issue >>> ticket and a pull request ticket. >>> >>> >>> Hmm.. So pull requests *are* issues. This sounds like it might >>> actually be a feature and also means that we *are* using the Github issue >>> tracker (just only those issues that have a pull-request attached). >>> Losing labels seems like a real problem (are they really lost or do they >>> just not appear in the pull-request view?) >>> >>> >>> 3. No attachments for issues (screenshots, supporting documents, etc.). >>> Having API access to data won't help you here. >>> >>> >>> Using gists and references to gists can overcome this. Also using an >>> attachment service like http://uploading.com/ or dropbox makes this >>> problem less of an issue really. >>> >>> >>> 4. No custom queries. We love these in the Sage trac instance; since we >>> have full access to the database, we can run any sort of query we want. >>> With API data access, you can build your own queries, so maybe this >>> isn't insurmountable. >>> >>> >>> yes, you can build your own queries. This seems like an area where >>> github can improve (and tools can be written which improve the experience). >>> >>> >>> >>> 5. Stylistically, the webpage is not very dense on information. I get >>> frustrated when trying to see the issues because they only come 25 at a >>> time, and never grouped into any sort of groupings, and there are only 3 >>> options for sorting issues. Compare the very nice, dense layout of >>> Google Code issues or bitbucket. Google Code issues also lets you >>> cross-tabulate the issues so you can quickly triage them. Compare also >>> the pretty comprehensive options for sorting and grouping things in trac. >>> >>> >>> Yes, it looks like you can group via labels, milestones, and "your" >>> issues. This is also something that can be over-come with tools that use >>> the github API. >>> >>> >>> It would be good to hear from users of the IPython github issue tracker >>> to see how they like it "in the wild". How problematic are these issues >>> in practice. Does it reduce or increase the participation in issue >>> tracking both by users and by developers. >>> >>> Thanks, >>> >>> -Travis >>> >>> >>> >>> >>> >>> 6. Side-by-side diffs are nice to have, and I believe bitbucket and >>> google code both have them. Of course, this isn't a deal-breaker >>> because you can always pull the branch down, but it would be nice to >>> have, and there's not really a way we can put it into the github tracker >>> ourselves. >>> >>> How does, for example, the JIRA github connector work? Does it pull in >>> code comments, etc.? >>> >>> Anyways, I'm not a regular contributor to numpy, but I have been trying >>> to get used to the github tracker for about a year now, and I just keep >>> getting more frustrated at it. I suppose the biggest frustrating part >>> about it is that it is closed source, so even if I did want to scratch >>> an itch, I can't. >>> >>> That said, it is nice to have code and dev conversations happening in >>> one place. There are great things about github issues, of course. But >>> I'm not so sure, for me, that they outweigh some of the administrative >>> issues listed above. >>> >>> >> I'm thinking we could do worse than simply take Ralf's top pick. Github >> definitely sounds a bit clunky for issue tracking, and while we could put >> together workarounds, I think Jason's point about the overall frustration >> is telling. And while we could, maybe, put together tools to work with it, >> I think what we want is something that works out of the box. Implementing >> workarounds for a frustrating system doesn't seem the best use of developer >> time. >> > > Having looked at the IPython issues and Jason's example, it's still my > impression that Github is inferior to Trac/Redmine as a bug tracker -- but > not as much as I first thought. The IPython team has managed to make it > work quite well (assuming you can stand the multi-colored patchwork of > labels...). > > At this point it's probably good to look again at the problems we want to > solve: > 1. responsive user interface (must absolutely have) > 2. mass editing of tickets (good to have) > 3. usable API (good to have) > 4. various ideas/issues mentioned at > http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow > > Note that Github does solve 1, 2 and 3 (as does Redmine). It does come > with some new problems that require workarounds, but we can probably live > with them. I'm not convinced that being on Github will actually get more > eyes on the tickets, but there certainly won't be less. > > The main problem with Github (besides the issues/PRs thing and no > attachments, which I can live with) is that to make it work we'll have to > religiously label everything. And because users aren't allowed to attach > labels, it will require a larger time investment from maintainers. Are we > okay with that? If everyone else is and we can distribute this task, it's > fine with me. > > David has been investigating bug trackers long before me, and Pauli has > done most of the work administering Trac as far as I know, so I'd like to > at least hear their preferences too before we make a decision. Then I hope > we can move this along quickly, because any choice will be a huge > improvement over the current situation. > > Redmine looks to offer a lot for project management, not just issue tracking. We don't do much in the way of project management, but that may well be a case of not having the tools, opportunity, and training. Once those tools are available we might find a use for them. I think Redmine offers more open ended opportunity for improvements for the project as a whole than github, which seems something of a dead end in that regard. The fact that Redmine also supports multiple projects might make it a better fit with the goals of NumFocus over the long run. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Tue May 1 16:19:41 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 1 May 2012 13:19:41 -0700 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <4F9F552F.4060605@creativetrax.com> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> Message-ID: Hi folks, sorry for not jumping in before, swamped with deadlines... On Mon, Apr 30, 2012 at 8:14 PM, Jason Grout wrote: > I've been pretty impressed with the lemonade that the IPython folks have > made out of what I see as pretty limiting shortcomings of the github > issue tracker. ?I've been trying to use it for a much smaller project I think your summary is all very valid, and mostly we've just learned to live with some of its limitations and hacked around some of them. Now, one thing that the hyper-simplistic github tracker has made me think is about the value of over-sophisticated tools, which sometimes can become an end in and of themselves... Because the tracker is so absurdly simple, you just can't spend much time playing with it and may spend more time coding on the project itself. But having said that, I still think GHI (short for github issues) has *real* issues that are true limitations, and I keep hoping they'll improve (though it's starting to look more like unreasonable hope, as time goes by). > 1. No key:value pairs for labels (Fernando brought this up a long time > ago, I think). ?This is brilliant in Google code's tracker, and allows > for custom fields that help in tracking workflow (like status, priority, > etc.). ?Sure, you can do what the IPython folks are doing and just > create labels for every possible status, but that's unwieldy and takes a > lot of discipline to maintain. ?Which means it takes a lot of developer > time or it becomes inconsistent and not very useful. I don't think it takes too much time in practice, but it's indeed a poor man's substitute for the google system. And for certain things, like priority, you *really* want a proper way of saying 'show me all issues of priority X or higher', which you can't do with labels. > 2. The disjointed relationship between pull requests and issues. ?They This is the one that pisses me off the most. It's a real, constant drag to not be able to label issues and see those labels. I can't fathom why on Earth GH hasn't added this, and what bizarre thought process could possibly have led to PRs being implemented alongside issues and yet, being hobbled by deliberately *hiding* their labels. It feels almost as a misfeature written out of spite against the users. I fume every time I try to prioritize work on open PRs and have to do it via post-it notes on the wall because GH decided to *hide* the labels for PRs from me. Argh... > 3. No attachments for issues (screenshots, supporting documents, etc.). > ?Having API access to data won't help you here. This one doesn't actually bother me in practice at all. Gist works perfectly for text snippets, and since they're version-controlled, it's even better than static attachments. And for images, a simple imgur upload (many screenshot programs even upload for you) along with ![tag](url) provides even better results: the images are inlined in the discussion. See for example: https://github.com/ipython/ipython/issues/1443 So here, I actually *prefer* gist+image urls to an attachment system. Arguably the externally hosted images could be lost if that server goes down, so it would perhaps be better to have them hosted at GH itself. They might consider that (or simply allowing gists to take binary uploads). > 4. No custom queries. ?We love these in the Sage trac instance; since we > have full access to the database, we can run any sort of query we want. > ?With API data access, you can build your own queries, so maybe this > isn't insurmountable. Yes, this is doable with a little elbow grease. > 5. Stylistically, the webpage is not very dense on information. ?I get > frustrated when trying to see the issues because they only come 25 at a > time, and never grouped into any sort of groupings, and there are only 3 > options for sorting issues. ?Compare the very nice, dense layout of > Google Code issues or bitbucket. ?Google Code issues also lets you > cross-tabulate the issues so you can quickly triage them. ?Compare also > the pretty comprehensive options for sorting and grouping things in trac. Agreed. Not a big deal breaker for us, but perhaps we're just living in blissful ignorance of what we're missing :) > 6. Side-by-side diffs are nice to have, and I believe bitbucket and > google code both have them. ?Of course, this isn't a deal-breaker > because you can always pull the branch down, but it would be nice to > have, and there's not really a way we can put it into the github tracker > ourselves. I guess they could add a markdown extension that displays a .diff url as a real diff... > That said, it is nice to have code and dev conversations happening in > one place. ?There are great things about github issues, of course. ?But > I'm not so sure, for me, that they outweigh some of the administrative > issues listed above. This is the real deal. In the end, we've learned to live with some of the GHI limitations and grumble under our breath about others, but there's genuine benefit to having a smooth flow of information in one environment. People get more efficient with the toolchain (I've learned markdown without even trying, simply from gradually doing nicer and nicer reports as I use the system), and there's a 'network effect' at play here. On Mon, Apr 30, 2012 at 11:16 PM, Charles R Harris wrote: > I find the trac email notifications pretty good in that regard, although it > would be nice to have everything in one place. The main issue I have, > actually, is that github won't send me everything. I want to see every > posted comment and every commit show up in the mail, including my own > comments. The RSS feeds seem better for those notifications, but I have two > feeds from github and they don't show the same things. Maybe we need to put > together a tracking workflow document to supplement the git workflow > document. This also bothers me quite a bit. I'd like my email thread to be a full record of the ticket, not just other people's responses. Furthermore, if you use markdown in an email reply, for some reason they don't render it on the site anymore. So I end up using the site directly most of the time so the issue reads nicely formatted. >> Can we at least agree to have all the wiki pages and web-site managed by >> github? For issue tracking, I'm very anxious for your and Ralf's This one I think is a no-brainer, at least the website pages. For that, the system using dual repos and a web team on github, like we do, is simply perfect. I see no reason whatsoever not to do this. In summary, I think we made the right decision for IPython with using GHI, and I do think that the simplicity of the tool does also help in focusing on writing more code instead of staring at reports of open issues :) But I do wish dearly they made improvements to some of the system's limitations listed above, because several of them are truly annoying to have to put up with. As for numpy/scipy, it's hard to say. The view of people like Ralf, Pauli, Chuck and others is what matters most here. If they are going to get so aggravated by the system's limitations that they don't want to work on the project, that's not a risk worth taking. On the other hand, I'm famous for being persnickety about my tools (Brian says IPython is the embodiment in source code of my untreated OCD :), and yet I've managed to adapt. So better souls than me can probably do it too ;) But if you do decide to go with GHI, it should be based on what the system is like *today*, not on the hope that it will get better. About a month ago they broke label filtering by turning multi-label filters into an OR operation, which effectively rendered the labels completely useless. Despite reporting it multiple times via their support tracker AND speaking in person at someone from GH, it still took well over a month or two to fix. For something so simple and so essential, I consider that to be atrociously bad response. So don't go for GHI on the hope it will get a lot better soon, b/c their recent record doesn't support a hopeful viewpoint. tl,dr; GHI is a net positive for IPython, but it does have real annoyances you can't just wish away. Best, f From charlesr.harris at gmail.com Tue May 1 16:35:12 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 14:35:12 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 2:17 PM, Charles R Harris wrote: > > > On Tue, May 1, 2012 at 1:34 PM, Ralf Gommers wrote: > >> >> >> On Tue, May 1, 2012 at 9:12 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Tue, May 1, 2012 at 12:52 AM, Travis Oliphant wrote: >>> >>>> >>>> On Apr 30, 2012, at 10:14 PM, Jason Grout wrote: >>>> >>>> On 4/30/12 6:31 PM, Travis Oliphant wrote: >>>> >>>> Hey all, >>>> >>>> >>>> We have been doing some investigation of various approaches to issue >>>> tracking. The last time the conversation left this list was with >>>> Ralf's current list of preferences as: >>>> >>>> >>>> 1) Redmine >>>> >>>> 2) Trac >>>> >>>> 3) Github >>>> >>>> >>>> Since that time, Maggie who has been doing a lot of work settting up >>>> various issue tracking tools over the past couple of months, has set up a >>>> redmine instance and played with it. This is a possibility as a future >>>> issue tracker. >>>> >>>> >>>> However, today I took a hard look at what the IPython folks are doing >>>> with their issue tracker and was very impressed by the level of community >>>> integration that having issues tracked by Github provides. Right now, we >>>> have a major community problem in that there are 3 conversations taking >>>> place (well at least 2 1/2). One on Github, one on this list, and one on >>>> the Trac and it's accompanying wiki. >>>> >>>> >>>> I would like to propose just using Github's issue tracker. This just >>>> seems like the best move overall for us at this point. I like how the >>>> Pull Request mechanism integrates with the issue tracking. We could >>>> setup a Redmine instance but this would just re-create the same separation >>>> of communities that currently exists with the pull-requests, the mailing >>>> list, and the Trac pages. Redmine is nicer than Trac, but it's still a >>>> separate space. We need to make Github the NumPy developer hub and not >>>> have it spread throughout several sites. >>>> >>>> >>>> The same is true of SciPy. I think if SciPy also migrates to use >>>> Github issues, then together with IPython we can really be a voice that >>>> helps Github. I will propose to NumFOCUS that the Foundation sponsor >>>> migration of the Trac to Github for NumPy and SciPy. If anyone would >>>> like to be involved in this migration project, please let me know. >>>> >>>> >>>> Comments, concerns? >>>> >>>> >>>> I've been pretty impressed with the lemonade that the IPython folks >>>> have >>>> made out of what I see as pretty limiting shortcomings of the github >>>> issue tracker. I've been trying to use it for a much smaller project >>>> (https://github.com/sagemath/sagecell/), and it is a lot harder, in my >>>> (somewhat limited) experience, than using trac or the google issue >>>> tracker. None of these issues seems like it would be too hard to >>>> solve, >>>> but since we don't even have the source to the tracker, we're somewhat >>>> at github's mercy for any improvements. Github does have a very nice >>>> API for interacting with the data, which somewhat makes up for some of >>>> the severe shortcomings of the web interface. >>>> >>>> In no particular order, here are a few that come to mind immediately: >>>> >>>> 1. No key:value pairs for labels (Fernando brought this up a long time >>>> ago, I think). This is brilliant in Google code's tracker, and allows >>>> for custom fields that help in tracking workflow (like status, >>>> priority, >>>> etc.). Sure, you can do what the IPython folks are doing and just >>>> create labels for every possible status, but that's unwieldy and takes >>>> a >>>> lot of discipline to maintain. Which means it takes a lot of developer >>>> time or it becomes inconsistent and not very useful. >>>> >>>> >>>> I'm not sure how much of an issue this is. A lot of tools use single >>>> tags for categorization and it works pretty well. A simple "key:value" >>>> label communicates about the same information together with good query >>>> tools. >>>> >>>> >>>> 2. The disjointed relationship between pull requests and issues. They >>>> share numberings, for example, and both support discussions, etc. If >>>> you use the API, you can submit code to an issue, but then the issue >>>> becomes a pull request, which means that all labels on the issue >>>> disappear from the web interface (but you can still manage to set >>>> labels >>>> using the list view of the issue tracker, if I recall correctly). If >>>> you don't attach code to issues, it means that every issue is >>>> duplicated >>>> in a pull request, which splits the conversation up between an issue >>>> ticket and a pull request ticket. >>>> >>>> >>>> Hmm.. So pull requests *are* issues. This sounds like it might >>>> actually be a feature and also means that we *are* using the Github issue >>>> tracker (just only those issues that have a pull-request attached). >>>> Losing labels seems like a real problem (are they really lost or do they >>>> just not appear in the pull-request view?) >>>> >>>> >>>> 3. No attachments for issues (screenshots, supporting documents, etc.). >>>> Having API access to data won't help you here. >>>> >>>> >>>> Using gists and references to gists can overcome this. Also using an >>>> attachment service like http://uploading.com/ or dropbox makes this >>>> problem less of an issue really. >>>> >>>> >>>> 4. No custom queries. We love these in the Sage trac instance; since >>>> we >>>> have full access to the database, we can run any sort of query we want. >>>> With API data access, you can build your own queries, so maybe this >>>> isn't insurmountable. >>>> >>>> >>>> yes, you can build your own queries. This seems like an area where >>>> github can improve (and tools can be written which improve the experience). >>>> >>>> >>>> >>>> 5. Stylistically, the webpage is not very dense on information. I get >>>> frustrated when trying to see the issues because they only come 25 at a >>>> time, and never grouped into any sort of groupings, and there are only >>>> 3 >>>> options for sorting issues. Compare the very nice, dense layout of >>>> Google Code issues or bitbucket. Google Code issues also lets you >>>> cross-tabulate the issues so you can quickly triage them. Compare also >>>> the pretty comprehensive options for sorting and grouping things in >>>> trac. >>>> >>>> >>>> Yes, it looks like you can group via labels, milestones, and "your" >>>> issues. This is also something that can be over-come with tools that use >>>> the github API. >>>> >>>> >>>> It would be good to hear from users of the IPython github issue tracker >>>> to see how they like it "in the wild". How problematic are these issues >>>> in practice. Does it reduce or increase the participation in issue >>>> tracking both by users and by developers. >>>> >>>> Thanks, >>>> >>>> -Travis >>>> >>>> >>>> >>>> >>>> >>>> 6. Side-by-side diffs are nice to have, and I believe bitbucket and >>>> google code both have them. Of course, this isn't a deal-breaker >>>> because you can always pull the branch down, but it would be nice to >>>> have, and there's not really a way we can put it into the github >>>> tracker >>>> ourselves. >>>> >>>> How does, for example, the JIRA github connector work? Does it pull in >>>> code comments, etc.? >>>> >>>> Anyways, I'm not a regular contributor to numpy, but I have been trying >>>> to get used to the github tracker for about a year now, and I just keep >>>> getting more frustrated at it. I suppose the biggest frustrating part >>>> about it is that it is closed source, so even if I did want to scratch >>>> an itch, I can't. >>>> >>>> That said, it is nice to have code and dev conversations happening in >>>> one place. There are great things about github issues, of course. But >>>> I'm not so sure, for me, that they outweigh some of the administrative >>>> issues listed above. >>>> >>>> >>> I'm thinking we could do worse than simply take Ralf's top pick. Github >>> definitely sounds a bit clunky for issue tracking, and while we could put >>> together workarounds, I think Jason's point about the overall frustration >>> is telling. And while we could, maybe, put together tools to work with it, >>> I think what we want is something that works out of the box. Implementing >>> workarounds for a frustrating system doesn't seem the best use of developer >>> time. >>> >> >> Having looked at the IPython issues and Jason's example, it's still my >> impression that Github is inferior to Trac/Redmine as a bug tracker -- but >> not as much as I first thought. The IPython team has managed to make it >> work quite well (assuming you can stand the multi-colored patchwork of >> labels...). >> >> At this point it's probably good to look again at the problems we want to >> solve: >> 1. responsive user interface (must absolutely have) >> 2. mass editing of tickets (good to have) >> 3. usable API (good to have) >> 4. various ideas/issues mentioned at >> http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow >> >> Note that Github does solve 1, 2 and 3 (as does Redmine). It does come >> with some new problems that require workarounds, but we can probably live >> with them. I'm not convinced that being on Github will actually get more >> eyes on the tickets, but there certainly won't be less. >> >> The main problem with Github (besides the issues/PRs thing and no >> attachments, which I can live with) is that to make it work we'll have to >> religiously label everything. And because users aren't allowed to attach >> labels, it will require a larger time investment from maintainers. Are we >> okay with that? If everyone else is and we can distribute this task, it's >> fine with me. >> >> David has been investigating bug trackers long before me, and Pauli has >> done most of the work administering Trac as far as I know, so I'd like to >> at least hear their preferences too before we make a decision. Then I hope >> we can move this along quickly, because any choice will be a huge >> improvement over the current situation. >> >> > Redmine looks to offer a lot for project management, not just issue > tracking. We don't do much in the way of project management, but that may > well be a case of not having the tools, opportunity, and training. Once > those tools are available we might find a use for them. I think Redmine > offers more open ended opportunity for improvements for the project as a > whole than github, which seems something of a dead end in that regard. The > fact that Redmine also supports multiple projects might make it a better > fit with the goals of NumFocus over the long run. > > To expand on this a bit. At some point you will need to generate timelines, reports, and statistics in order to sell the project to people with real money. It looks like Redmine would help a lot in this regard. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Tue May 1 16:36:53 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Tue, 01 May 2012 15:36:53 -0500 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> Message-ID: <4FA04965.1010809@creativetrax.com> On 5/1/12 3:19 PM, Fernando Perez wrote: > But if you do decide to go with GHI, it should be based on what the > system is like*today*, not on the hope that it will get better. > About a month ago they broke label filtering by turning multi-label > filters into an OR operation, which effectively rendered the labels > completely useless. Despite reporting it multiple times via their > support tracker AND speaking in person at someone from GH, it still > took well over a month or two to fix. For something so simple and so > essential, I consider that to be atrociously bad response. So don't > go for GHI on the hope it will get a lot better soon, b/c their recent > record doesn't support a hopeful viewpoint. This example indicates that basing your decision on what it is like *today* may not be valid either. You'd hope that they won't do something really silly, but you can't change it if they do, and you can't just keep running the old version of issues to avoid problems since you don't have control over that either. Anyway, like everyone else has said, Ralf, Pauli, et. al. are really the ones to vote in this. Given Fernando's responses, I realize why GHI still works for us---our small project has me and 2-4 students, and we all pretty much meet each week to triage issues together, and there are only about 40 open issues. It's a simple enough project that we need *something*, but we don't need to spend our time setting up complicated infrastructure. I do wish we could use Google Code issues with Github pull requests, though :). Thanks, Jason From cgohlke at uci.edu Tue May 1 17:09:03 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Tue, 01 May 2012 14:09:03 -0700 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: References: Message-ID: <4FA050EF.4000709@uci.edu> On 4/30/2012 1:16 PM, Ralf Gommers wrote: > Hi all, > > Charles has done a great job of backporting a lot of bug fixes to 1.6.2, > see PRs 260, 261, 262 and 263. For those who are interested, please have > a look at those PRs to see and comment on what's proposed to go into 1.6.2. > > I also have a request for help with testing: can someone who uses MSVC > test (preferably with a 2.x and a 3.x version)? I have a branch with all > four PRs merged at https://github.com/rgommers/numpy/tree/bports > > Thanks, > Ralf > > > Any chance pull requests 188 and 227 could make it into numpy 1.6.2? https://github.com/numpy/numpy/pull/188 https://github.com/numpy/numpy/pull/227 Thank you, Christoph From charlesr.harris at gmail.com Tue May 1 17:17:18 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 15:17:18 -0600 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: <4FA050EF.4000709@uci.edu> References: <4FA050EF.4000709@uci.edu> Message-ID: On Tue, May 1, 2012 at 3:09 PM, Christoph Gohlke wrote: > > > On 4/30/2012 1:16 PM, Ralf Gommers wrote: > > Hi all, > > > > Charles has done a great job of backporting a lot of bug fixes to 1.6.2, > > see PRs 260, 261, 262 and 263. For those who are interested, please have > > a look at those PRs to see and comment on what's proposed to go into > 1.6.2. > > > > I also have a request for help with testing: can someone who uses MSVC > > test (preferably with a 2.x and a 3.x version)? I have a branch with all > > four PRs merged at https://github.com/rgommers/numpy/tree/bports > > > > Thanks, > > Ralf > > > > > > > > Any chance pull requests 188 and 227 could make it into numpy 1.6.2? > > https://github.com/numpy/numpy/pull/188 > Looks easy to do. > https://github.com/numpy/numpy/pull/227 > This one is isolated to random. I'll put up another PR for the backport. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Tue May 1 17:44:41 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 1 May 2012 14:44:41 -0700 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <4FA04965.1010809@creativetrax.com> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <4FA04965.1010809@creativetrax.com> Message-ID: On Tue, May 1, 2012 at 1:36 PM, Jason Grout wrote: > This example indicates that basing ?your decision on what it is like > *today* may not be valid either. ?You'd hope that they won't do Very true ;) > Anyway, like everyone else has said, Ralf, Pauli, et. al. are really the > ones to vote in this. ?Given Fernando's responses, I realize why GHI > still works for us---our small project has me and 2-4 students, and we > all pretty much meet each week to triage issues together, and there are > only about 40 open issues. ?It's a simple enough project that we need I just reread my reply with a full stomach, and I wanted to add something, because I think it may appear a bit too negative. *In practice*, the system does work very fluidly, and other than the no-labels-on-PRs, it just gets out of your way. Being able to simply type #NN in any comment or git commit ('closes #NN') and get everything auto-linked, closed if needed, etc., has major value in practice, and shouldn't be underestimated. Using two separate tools adds real friction to the everyday workflow, and if GH has taught me one thing, it's that the very fluid workflow they enable leads to massive productivity improvements. For IPython, the change from Launchpad/bzr to GH/git has been truly night and day in terms of productivity. We process a volume of code today that would be unthinkable before, and I think a big part of that is that the tools simply get out of our way and let us just work. So, as much as I do complain about real problems with GHI, I also think it's important to evaluate carefully the cost of a dual-system solution. Sometimes the lesser tool you know how to use is better than the fancier one that creates friction. Put another way: no matter how fancy your new $400 racket is, you'll never beat Pete Sampras on a tennis court even if he's using a wood board to play :) Cheers, f From travis at continuum.io Tue May 1 18:06:35 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 1 May 2012 17:06:35 -0500 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: <4B12D4DE-78CB-4D2B-A825-3DFBF834332D@continuum.io> Thanks Ralf, I agree that Pauli and David have a lot of say in what we do. Thanks for reminding. We will wait to hear from them. If together you three feel like we should set up a separate Redmine instance than we can do that and just pay special attention with the Github integration and links in the right places to make sure the conversations stay connected. I just wanted to point out some of the benefits of Github that I've noticed. I can see pros and cons for both. Thanks for reminding me of the issue that users can't assign lables --- that must be done by a maintainer. That might be a benefit though, providing some control over how labels get assigned. Thanks, -Travis On May 1, 2012, at 2:34 PM, Ralf Gommers wrote: > > > On Tue, May 1, 2012 at 9:12 AM, Charles R Harris wrote: > > > On Tue, May 1, 2012 at 12:52 AM, Travis Oliphant wrote: > > On Apr 30, 2012, at 10:14 PM, Jason Grout wrote: > >> On 4/30/12 6:31 PM, Travis Oliphant wrote: >>> Hey all, >>> >>> We have been doing some investigation of various approaches to issue tracking. The last time the conversation left this list was with Ralf's current list of preferences as: >>> >>> 1) Redmine >>> 2) Trac >>> 3) Github >>> >>> Since that time, Maggie who has been doing a lot of work settting up various issue tracking tools over the past couple of months, has set up a redmine instance and played with it. This is a possibility as a future issue tracker. >>> >>> However, today I took a hard look at what the IPython folks are doing with their issue tracker and was very impressed by the level of community integration that having issues tracked by Github provides. Right now, we have a major community problem in that there are 3 conversations taking place (well at least 2 1/2). One on Github, one on this list, and one on the Trac and it's accompanying wiki. >>> >>> I would like to propose just using Github's issue tracker. This just seems like the best move overall for us at this point. I like how the Pull Request mechanism integrates with the issue tracking. We could setup a Redmine instance but this would just re-create the same separation of communities that currently exists with the pull-requests, the mailing list, and the Trac pages. Redmine is nicer than Trac, but it's still a separate space. We need to make Github the NumPy developer hub and not have it spread throughout several sites. >>> >>> The same is true of SciPy. I think if SciPy also migrates to use Github issues, then together with IPython we can really be a voice that helps Github. I will propose to NumFOCUS that the Foundation sponsor migration of the Trac to Github for NumPy and SciPy. If anyone would like to be involved in this migration project, please let me know. >>> >>> Comments, concerns? >> >> I've been pretty impressed with the lemonade that the IPython folks have >> made out of what I see as pretty limiting shortcomings of the github >> issue tracker. I've been trying to use it for a much smaller project >> (https://github.com/sagemath/sagecell/), and it is a lot harder, in my >> (somewhat limited) experience, than using trac or the google issue >> tracker. None of these issues seems like it would be too hard to solve, >> but since we don't even have the source to the tracker, we're somewhat >> at github's mercy for any improvements. Github does have a very nice >> API for interacting with the data, which somewhat makes up for some of >> the severe shortcomings of the web interface. >> >> In no particular order, here are a few that come to mind immediately: >> >> 1. No key:value pairs for labels (Fernando brought this up a long time >> ago, I think). This is brilliant in Google code's tracker, and allows >> for custom fields that help in tracking workflow (like status, priority, >> etc.). Sure, you can do what the IPython folks are doing and just >> create labels for every possible status, but that's unwieldy and takes a >> lot of discipline to maintain. Which means it takes a lot of developer >> time or it becomes inconsistent and not very useful. > > I'm not sure how much of an issue this is. A lot of tools use single tags for categorization and it works pretty well. A simple "key:value" label communicates about the same information together with good query tools. > >> >> 2. The disjointed relationship between pull requests and issues. They >> share numberings, for example, and both support discussions, etc. If >> you use the API, you can submit code to an issue, but then the issue >> becomes a pull request, which means that all labels on the issue >> disappear from the web interface (but you can still manage to set labels >> using the list view of the issue tracker, if I recall correctly). If >> you don't attach code to issues, it means that every issue is duplicated >> in a pull request, which splits the conversation up between an issue >> ticket and a pull request ticket. > > Hmm.. So pull requests *are* issues. This sounds like it might actually be a feature and also means that we *are* using the Github issue tracker (just only those issues that have a pull-request attached). Losing labels seems like a real problem (are they really lost or do they just not appear in the pull-request view?) > >> >> 3. No attachments for issues (screenshots, supporting documents, etc.). >> Having API access to data won't help you here. > > Using gists and references to gists can overcome this. Also using an attachment service like http://uploading.com/ or dropbox makes this problem less of an issue really. > >> >> 4. No custom queries. We love these in the Sage trac instance; since we >> have full access to the database, we can run any sort of query we want. >> With API data access, you can build your own queries, so maybe this >> isn't insurmountable. > > yes, you can build your own queries. This seems like an area where github can improve (and tools can be written which improve the experience). > >> >> 5. Stylistically, the webpage is not very dense on information. I get >> frustrated when trying to see the issues because they only come 25 at a >> time, and never grouped into any sort of groupings, and there are only 3 >> options for sorting issues. Compare the very nice, dense layout of >> Google Code issues or bitbucket. Google Code issues also lets you >> cross-tabulate the issues so you can quickly triage them. Compare also >> the pretty comprehensive options for sorting and grouping things in trac. > > Yes, it looks like you can group via labels, milestones, and "your" issues. This is also something that can be over-come with tools that use the github API. > > > It would be good to hear from users of the IPython github issue tracker to see how they like it "in the wild". How problematic are these issues in practice. Does it reduce or increase the participation in issue tracking both by users and by developers. > > Thanks, > > -Travis > > > > >> >> 6. Side-by-side diffs are nice to have, and I believe bitbucket and >> google code both have them. Of course, this isn't a deal-breaker >> because you can always pull the branch down, but it would be nice to >> have, and there's not really a way we can put it into the github tracker >> ourselves. >> >> How does, for example, the JIRA github connector work? Does it pull in >> code comments, etc.? >> >> Anyways, I'm not a regular contributor to numpy, but I have been trying >> to get used to the github tracker for about a year now, and I just keep >> getting more frustrated at it. I suppose the biggest frustrating part >> about it is that it is closed source, so even if I did want to scratch >> an itch, I can't. >> >> That said, it is nice to have code and dev conversations happening in >> one place. There are great things about github issues, of course. But >> I'm not so sure, for me, that they outweigh some of the administrative >> issues listed above. >> > > > I'm thinking we could do worse than simply take Ralf's top pick. Github definitely sounds a bit clunky for issue tracking, and while we could put together workarounds, I think Jason's point about the overall frustration is telling. And while we could, maybe, put together tools to work with it, I think what we want is something that works out of the box. Implementing workarounds for a frustrating system doesn't seem the best use of developer time. > > Having looked at the IPython issues and Jason's example, it's still my impression that Github is inferior to Trac/Redmine as a bug tracker -- but not as much as I first thought. The IPython team has managed to make it work quite well (assuming you can stand the multi-colored patchwork of labels...). > > At this point it's probably good to look again at the problems we want to solve: > 1. responsive user interface (must absolutely have) > 2. mass editing of tickets (good to have) > 3. usable API (good to have) > 4. various ideas/issues mentioned at http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow > > Note that Github does solve 1, 2 and 3 (as does Redmine). It does come with some new problems that require workarounds, but we can probably live with them. I'm not convinced that being on Github will actually get more eyes on the tickets, but there certainly won't be less. > > The main problem with Github (besides the issues/PRs thing and no attachments, which I can live with) is that to make it work we'll have to religiously label everything. And because users aren't allowed to attach labels, it will require a larger time investment from maintainers. Are we okay with that? If everyone else is and we can distribute this task, it's fine with me. > > David has been investigating bug trackers long before me, and Pauli has done most of the work administering Trac as far as I know, so I'd like to at least hear their preferences too before we make a decision. Then I hope we can move this along quickly, because any choice will be a huge improvement over the current situation. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue May 1 18:08:14 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 02 May 2012 00:08:14 +0200 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: 01.05.2012 21:34, Ralf Gommers kirjoitti: [clip] > The main problem with Github (besides the issues/PRs thing and no > attachments, which I can live with) is that to make it work we'll have > to religiously label everything. And because users aren't allowed to > attach labels, it will require a larger time investment from > maintainers. Are we okay with that? If everyone else is and we can > distribute this task, it's fine with me. I think this is probably not a very big deal, as long as there is a way of seeing the "inbox" of unlabeled bugs, and as long as the tracker sends mail about new bugs. The rate at which tickets are reported to us is at the moment not too that big. Triaging them will probably boil down to a five minutes of brainless work every second week, or so. > David has been investigating bug trackers long before me, and Pauli has > done most of the work administering Trac as far as I know, so I'd like > to at least hear their preferences too before we make a decision. Then I > hope we can move this along quickly, because any choice will be a huge > improvement over the current situation. One big plus with Github compared to Redmine et al. is that someone else (TM) hosts it. One thing less to think about. Apart from the attachments issue, I don't have very big quibbles with the GH issue tracker. Pauli From charlesr.harris at gmail.com Tue May 1 18:18:57 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 16:18:57 -0600 Subject: [Numpy-discussion] 1.6.2 release - backports and MSVC testing help In-Reply-To: <4FA050EF.4000709@uci.edu> References: <4FA050EF.4000709@uci.edu> Message-ID: On Tue, May 1, 2012 at 3:09 PM, Christoph Gohlke wrote: > > > On 4/30/2012 1:16 PM, Ralf Gommers wrote: > > Hi all, > > > > Charles has done a great job of backporting a lot of bug fixes to 1.6.2, > > see PRs 260, 261, 262 and 263. For those who are interested, please have > > a look at those PRs to see and comment on what's proposed to go into > 1.6.2. > > > > I also have a request for help with testing: can someone who uses MSVC > > test (preferably with a 2.x and a 3.x version)? I have a branch with all > > four PRs merged at https://github.com/rgommers/numpy/tree/bports > > > > Thanks, > > Ralf > > > > > > > > Any chance pull requests 188 and 227 could make it into numpy 1.6.2? > > > https://github.com/numpy/numpy/pull/188 > Added to PR 260. The relevant routines had been rewritten prior to the patch and the variable changed. I think I got this right, but you should test it. https://github.com/numpy/numpy/pull/227 > Is up as PR 265. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ceball at gmail.com Tue May 1 18:22:48 2012 From: ceball at gmail.com (Chris Ball) Date: Tue, 1 May 2012 22:22:48 +0000 (UTC) Subject: [Numpy-discussion] Continuous Integration References: <1335862622-sup-5857@david-desktop> Message-ID: Pauli Virtanen iki.fi> writes: > > 01.05.2012 11:14, David Froger kirjoitti: > > Excerpts from Travis Oliphant's message of mar. mai 01 01:39:26 +0200 2012: > > > If you have particular reasons why we should choose a particular CI service, > > > please speak up and let your voice be heard. There is still time to > make > > > a difference in what we are setting up. > > > > Hi all, > > > > What about buildbot? (http://trac.buildbot.net/) > > > > I'm using it currently, and like it because is GPL 2, configuration files are > > powerful Python scripts, and development team is active and dynamic. > > We're currently using it: http://buildbot.scipy.org > Although, it hasn't been building automatically for some time now. I've been working on setting up a new buildbot for NumPy. Unfortunately, I don't have much time to work on it, so it's slow going! Right now I am still at the stage of getting NumPy to pass all its tests on the machines I'm using as test slaves. After that, I plan to transfer existing slaves to the new setup, and then maybe ask for new volunteer slave machines (if people think the buildbot setup is useful). One nice feature of the new buildbot setup is that it can build and test a pull request, reporting the results back via a comment on the pull request. > It has the following shortcomings: > > - Apparently, for one master instance, only one project. As others have said, this is no longer true. > - Last time I looked, Git integration was poor. Maybe this has > improved since... Again, as others have said, I think this has been improved since last time you looked. Buildbot can receive github notifications or can poll a repository periodically. On the other hand, I've also been trying ShiningPanda (https:// jenkins.shiningpanda.com/scipy/) because I like the idea of a hosted CI solution. Again, though, I haven't yet got all the tests to pass on ShiningPanda's default Debian 6 setup (and again, that's because of a lack of time). So, I'm working on Buildbot and ShiningPanda from the community side, but am always ready to step aside if someone else has time :) Chris From travis at continuum.io Tue May 1 18:27:25 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 1 May 2012 17:27:25 -0500 Subject: [Numpy-discussion] Continuous Integration In-Reply-To: References: <1335862622-sup-5857@david-desktop> Message-ID: > > So, I'm working on Buildbot and ShiningPanda from the community side, but am > always ready to step aside if someone else has time :) > Keep it up. Your input and feedback is invaluable. Plus, in this kind of situation, the more the merrier. There are a lot of different agents to test on. -Travis > Chris > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ceball at gmail.com Tue May 1 19:20:51 2012 From: ceball at gmail.com (Chris Ball) Date: Tue, 1 May 2012 23:20:51 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Test_failures_-_which_dependencies_a?= =?utf-8?q?m_I=09missing=3F?= References: Message-ID: Keith Hughitt gmail.com> writes: > Hi Chris, > > Try "sudo apt-get build-dep python-numpy" to install the dependencies for > building NumPy. I believe it will install all of the optional dependencies > as well. Thanks for that, but I'd already tried it and found the same failures. However, I also found that on my version of Ubuntu (10.04 LTS), which includes NumPy 1.3.0, running "numpy.test(verbose=3)" yielded the following: nose.config: INFO: Excluding tests matching ['f2py_ext','f2py_f90_ext','gen_ext', 'pyrex_ext', 'swig_ext', 'array_from_pyobj'] I'm not sure where this nose config comes from, but at least someone else knows about some of these failures; presumably they are not important. My next step was going to be searching some mailing lists/contacting the packager... Thanks, Chris From matthew.brett at gmail.com Tue May 1 19:21:42 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 1 May 2012 16:21:42 -0700 Subject: [Numpy-discussion] Continuous Integration In-Reply-To: References: <1335862622-sup-5857@david-desktop> Message-ID: Hi, On Tue, May 1, 2012 at 3:22 PM, Chris Ball wrote: > Pauli Virtanen iki.fi> writes: > >> >> 01.05.2012 11:14, David Froger kirjoitti: >> > Excerpts from Travis Oliphant's message of mar. mai 01 01:39:26 +0200 2012: >> > > If you have particular reasons why we should choose a particular CI > service, >> > > please speak up and let your voice be heard. ?There is still time to >> make >> > > a difference in what we are setting up. >> > >> > Hi all, >> > >> > What about ?buildbot? (http://trac.buildbot.net/) >> > >> > I'm using it currently, ?and like it ?because is GPL 2, ?configuration > files are >> > powerful Python scripts, and development team is active and dynamic. >> >> We're currently using it: http://buildbot.scipy.org >> Although, it hasn't been building automatically for some time now. > > I've been working on setting up a new buildbot for NumPy. Unfortunately, I > don't have much time to work on it, so it's slow going! Right now I am still at > the stage of getting NumPy to pass all its tests on the machines I'm using as > test slaves. After that, I plan to transfer existing slaves to the new setup, > and then maybe ask for new volunteer slave machines (if people think the > buildbot setup is useful). > > One nice feature of the new buildbot setup is that it can build and test a pull > request, reporting the results back via a comment on the pull request. > >> It has the following shortcomings: >> >> - Apparently, for one master instance, only one project. > > As others have said, this is no longer true. > >> - Last time I looked, Git integration was poor. Maybe this has >> ? improved since... > > Again, as others have said, I think this has been improved since last time you > looked. Buildbot can receive github notifications or can poll a repository > periodically. > > > On the other hand, I've also been trying ShiningPanda (https:// > jenkins.shiningpanda.com/scipy/) because I like the idea of a hosted CI > solution. Again, though, I haven't yet got all the tests to pass on > ShiningPanda's default Debian 6 setup (and again, that's because of a lack of > time). The advantage of having some minimal server as master for buildbot is that you can very easily add slaves on which you can trigger builds and get back build logs automatically. As I understand ShiningPanda (never used it, just talked to Fernando), you can push your own build reports up to a ShiningPanda instance, but I hear it's not so easy to trigger builds and upload reports from external slaves. By the way, in case it's interesting, here is our master.cfg for the nipy buildbot; not very pretty, but fairly easy to add a new slave and a new builder: https://github.com/nipy/nibotmi/blob/master/master.cfg and back-of-envelope procedures to set it up: https://github.com/nipy/nibotmi/blob/master/install.rst In particular see 'setting up a buildslave' at the end. Cheers, Matthew From pav at iki.fi Tue May 1 19:48:30 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 02 May 2012 01:48:30 +0200 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: 01.05.2012 21:34, Ralf Gommers kirjoitti: [clip] > At this point it's probably good to look again at the problems we want > to solve: > 1. responsive user interface (must absolutely have) Now that it comes too late: with some luck, I've possibly hit on what was ailing the Tracs (max_diff_bytes configured too large). Let's see if things work better from now on... Pauli From f_magician at mac.com Tue May 1 20:09:38 2012 From: f_magician at mac.com (Magician) Date: Wed, 02 May 2012 09:09:38 +0900 Subject: [Numpy-discussion] Building NumPy for Python on specified directory Message-ID: <847E05B6-632A-49DD-8D49-990A53F3E176@mac.com> Hi all, I'm now installing Python 2.7.3 and NumPy 1.6.1 on clean-installed CentOS 6.2. At first, I installed Python as below: > ./configure --prefix=/usr/local --enable-shared > make > make install > vi /etc/ld.so.conf #add /usr/local/lib > /sbin/ldconfig and successfully installed NumPy. But if I install Python as below: > ./configure --prefix=/usr/local/python-273 --enable-shared > make > make install > vi /etc/ld.so.conf #add /usr/local/python-273/lib > /sbin/ldconfig then "setup.py build" dumped these errors: > compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/local/python-273/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' > gcc: build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.c > gcc -pthread -shared build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.o -L. -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/_sort.so > /usr/bin/ld: cannot find -lpython2.7 > collect2: ld returned 1 exit status > /usr/bin/ld: cannot find -lpython2.7 > collect2: ld returned 1 exit status > error: Command "gcc -pthread -shared build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.o -L. -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/_sort.so" failed with exit status 1 How could I build NumPy for Python on specified directory? From josef.pktd at gmail.com Tue May 1 20:18:08 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 May 2012 20:18:08 -0400 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 7:48 PM, Pauli Virtanen wrote: > 01.05.2012 21:34, Ralf Gommers kirjoitti: > [clip] >> At this point it's probably good to look again at the problems we want >> to solve: >> 1. responsive user interface (must absolutely have) > > Now that it comes too late: with some luck, I've possibly hit on what > was ailing the Tracs (max_diff_bytes configured too large). Let's see if > things work better from now on... much better now If I were still going through tickets in scipy, I would prefer this view http://projects.scipy.org/scipy/query?status=apply&status=needs_decision&status=needs_info&status=needs_review&status=needs_work&status=new&status=reopened&component=scipy.stats&order=status to something like this https://github.com/ipython/ipython/issues?direction=desc&labels=windows&page=1&sort=created&state=open I never figured out how to effectively search in github, while with trac, I could always immediately reply to a question with the relevant link, or check the history instead of relying on memory. Josef > > ? ? ? ?Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue May 1 20:24:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 18:24:11 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 6:18 PM, wrote: > On Tue, May 1, 2012 at 7:48 PM, Pauli Virtanen wrote: > > 01.05.2012 21:34, Ralf Gommers kirjoitti: > > [clip] > >> At this point it's probably good to look again at the problems we want > >> to solve: > >> 1. responsive user interface (must absolutely have) > > > > Now that it comes too late: with some luck, I've possibly hit on what > > was ailing the Tracs (max_diff_bytes configured too large). Let's see if > > things work better from now on... > > > much better now > > If I were still going through tickets in scipy, I would prefer this view > > http://projects.scipy.org/scipy/query?status=apply&status=needs_decision&status=needs_info&status=needs_review&status=needs_work&status=new&status=reopened&component=scipy.stats&order=status > > to something like this > > https://github.com/ipython/ipython/issues?direction=desc&labels=windows&page=1&sort=created&state=open > > I never figured out how to effectively search in github, while with > trac, I could always immediately reply to a question with the relevant > link, or check the history instead of relying on memory. > > I would agree that a good search facility is essential, and not keyword/tag based. I've found some trac tickets with google on occasion, although not by initial intent. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 1 20:29:10 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 1 May 2012 18:29:10 -0600 Subject: [Numpy-discussion] Test failures - which dependencies am I missing? In-Reply-To: References: Message-ID: On Tue, May 1, 2012 at 5:20 PM, Chris Ball wrote: > Keith Hughitt gmail.com> writes: > > > Hi Chris, > > > > Try "sudo apt-get build-dep python-numpy" to install the dependencies for > > building NumPy. I believe it will install all of the optional > dependencies > > as well. > > Thanks for that, but I'd already tried it and found the same failures. > > However, I also found that on my version of Ubuntu (10.04 LTS), which > includes > NumPy 1.3.0, running "numpy.test(verbose=3)" yielded the following: > > nose.config: INFO: Excluding tests matching > ['f2py_ext','f2py_f90_ext','gen_ext', > 'pyrex_ext', 'swig_ext', 'array_from_pyobj'] > > Doesn't Debian separate f2py from numpy? Also, long running tests are skipped unless you specify 'full' as the test argument. Also, I don't recall if the f2py tests were actually installed in 1.3, I think that came later around 1.6. > I'm not sure where this nose config comes from, but at least someone else > knows > about some of these failures; presumably they are not important. My next > step > was going to be searching some mailing lists/contacting the packager... > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Tue May 1 20:29:28 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 1 May 2012 17:29:28 -0700 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 5:24 PM, Charles R Harris wrote: > I would agree that a good search facility is essential, and not keyword/tag > based. Github issues does have full-text search, and up until now I haven't really had too many problems with it. No sophisticated filtering or anything, but basic search with the option to see open/closed/all issues seems to work OK. Josef I'm curious, which problems have you had with it? Cheers, f From josef.pktd at gmail.com Tue May 1 21:04:48 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 1 May 2012 21:04:48 -0400 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 8:29 PM, Fernando Perez wrote: > On Tue, May 1, 2012 at 5:24 PM, Charles R Harris > wrote: >> I would agree that a good search facility is essential, and not keyword/tag >> based. > > Github issues does have full-text search, and up until now I haven't > really had too many problems with it. ?No sophisticated filtering or > anything, but basic search with the option to see open/closed/all > issues seems to work OK. > > Josef I'm curious, which problems have you had with it? maybe searching issues and pull requests is ok. The problem is that in statsmodels we did a lot of commits without pull requests, and I'm not very good searching in git either. (I don't remember which change I looked for but I got lost for half an hour searching for a change with git and github until I had to ask Skipper, who found it immediately.) What I was used to is integrated search without extra work, like this http://projects.scipy.org/scipy/search?q=mannwhitney and I had pretty much all the discussion to the history of a function. scipy.stats is easy too search because the function names are good search terms. scipy pull request don't have a search facility, as far as I was able to figure out, because the issues are not enabled, so I cannot check how good it would be. Josef > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Tue May 1 21:27:19 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 1 May 2012 18:27:19 -0700 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 6:04 PM, wrote: > maybe searching issues and pull requests is ok. > The problem is that in statsmodels we did a lot of commits without > pull requests, and I'm not very good searching in git either. > (I don't remember which change I looked for but I got lost for half an > hour searching for a change with git and github until I had to ask > Skipper, who found it immediately.) > > What I was used to is integrated search without extra work, like this > http://projects.scipy.org/scipy/search?q=mannwhitney > and I had pretty much all the discussion to the history of a function. Ah, I see: the point was really the ability to search into the commit history directly through the web UI. Indeed, I'd never thought of that b/c I simply use the git machinery at the command-line for everything. But I can see how that Trac search box, with the ability to selectively tick on/off searching into commits, bugs, etc, could be very useful. Cheers, f From jason-sage at creativetrax.com Tue May 1 21:35:41 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Tue, 01 May 2012 20:35:41 -0500 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: <4FA08F6D.5080706@creativetrax.com> On 5/1/12 7:24 PM, Charles R Harris wrote: > I would agree that a good search facility is essential, and not > keyword/tag based. I've found some trac tickets with google on occasion, > although not by initial intent. I use google to search the sage trac these days, using a shortcut to limit search results to the Sage trac site. To do this in Chrome, go to Preferences, then Basics, then Manage Search Engines. Down at the bottom, I fill in the three fields for a new search engine: Name: trac Keyword: t URL: http://www.google.com/#q=site:trac.sagemath.org+%s Then whenever I want to search trac, I just type "t " (t space) in the URL bar of Chrome, then type whatever I'm searching for. Google almost always pulls up the right ticket in the top few hits. And it's way faster than the trac search. Thanks, Jason From ralf.gommers at googlemail.com Wed May 2 01:47:52 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 2 May 2012 07:47:52 +0200 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Wed, May 2, 2012 at 1:48 AM, Pauli Virtanen wrote: > 01.05.2012 21:34, Ralf Gommers kirjoitti: > [clip] > > At this point it's probably good to look again at the problems we want > > to solve: > > 1. responsive user interface (must absolutely have) > > Now that it comes too late: with some luck, I've possibly hit on what > was ailing the Tracs (max_diff_bytes configured too large). Let's see if > things work better from now on... > That's amazing - not only does it not give errors anymore, it's also an order of magnitude faster. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed May 2 09:48:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 2 May 2012 07:48:11 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Tue, May 1, 2012 at 11:47 PM, Ralf Gommers wrote: > > > On Wed, May 2, 2012 at 1:48 AM, Pauli Virtanen wrote: > >> 01.05.2012 21:34, Ralf Gommers kirjoitti: >> [clip] >> > At this point it's probably good to look again at the problems we want >> > to solve: >> > 1. responsive user interface (must absolutely have) >> >> Now that it comes too late: with some luck, I've possibly hit on what >> was ailing the Tracs (max_diff_bytes configured too large). Let's see if >> things work better from now on... >> > > That's amazing - not only does it not give errors anymore, it's also an > order of magnitude faster. > > So maybe we could just stick with trac. Performance was really the sticking point. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Wed May 2 12:03:04 2012 From: heng at cantab.net (Henry Gomersall) Date: Wed, 02 May 2012 17:03:04 +0100 Subject: [Numpy-discussion] copying array to itself Message-ID: <1335974584.8654.44.camel@farnsworth> I'm need to do some shifting of data within an array and am using the following code: for p in numpy.arange(array.shape[0], dtype='int64'): for q in numpy.arange(array.shape[1]): # A positive shift is towards zero shift = shift_values[p, q] if shift >= 0: copy_len = array.shape[2] - shift array[p, q, :copy_len] = array[p, q, shift:] array[p, q, copy_len:] = 0 else: neg_shift = -shift copy_len = array.shape[2] - neg_shift array[p, q, neg_shift:] = array[p, q, :copy_len] array[p, q, :neg_shift] = 0 In essence, if shift is positive, it copies the data to a block closer the zero index on the last dimension, and if its negative, it copies it to a block further away. The problem I've encountered is that in for some values of p and q (which is consistent), it simply writes deterministic and consistent garbage (a repeated pattern around around -0.002) to that last dimension to which I'm trying to copy. Most of the values of p and q work fine and as expected. I can solve the problem with an interim copy. Is this some nuance of the way numpy does things? Or am I missing some stupid bug in my code? cheers, Henry From wkerzendorf at gmail.com Wed May 2 12:16:28 2012 From: wkerzendorf at gmail.com (Wolfgang Kerzendorf) Date: Wed, 2 May 2012 12:16:28 -0400 Subject: [Numpy-discussion] sparse array data Message-ID: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> Hi all, I'm currently writing a code that needs three dimensional data (for the physicists it's dimensions are atom, ion, level). The problem is that not all combinations do exist (a sparse array). Sparse matrices in scipy only deal with two dimensions. The operations that I need to do on those are running functions like exp(item/constant) on all of the items. I also want to sum them up in the last dimension. What's the best way to make a class that takes this kind of data and does the required operations fast. Maybe some phycisists have implemented these things already. Any thoughts? Cheers Wolfgang From nouiz at nouiz.org Wed May 2 13:24:09 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 2 May 2012 13:24:09 -0400 Subject: [Numpy-discussion] sparse array data In-Reply-To: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> Message-ID: what about numpy.ma? Those are marked array. But they won't be the fastest. Fred On Wed, May 2, 2012 at 12:16 PM, Wolfgang Kerzendorf wrote: > Hi all, > > I'm currently writing a code that needs three dimensional data (for the physicists it's dimensions are atom, ion, level). The problem is that not all combinations do exist (a sparse array). Sparse matrices in scipy only deal with two dimensions. The operations that I need to do on those are running functions like exp(item/constant) on all of the items. I also want to sum them up in the last dimension. What's the best way to make a class that takes this kind of data and does the required operations fast. Maybe some phycisists have implemented these things already. Any thoughts? > > Cheers > ? Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From Catherine.M.Moroney at jpl.nasa.gov Wed May 2 14:06:52 2012 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (388D)) Date: Wed, 2 May 2012 18:06:52 +0000 Subject: [Numpy-discussion] record arrays and vectorizing Message-ID: Hello, Can somebody give me some hints as to how to code up this function in pure python, rather than dropping down to Fortran? I will want to compare a 7-element vector (called "element") to a large list of similarly-dimensioned vectors (called "target", and pick out the vector in "target" that is the closest to "element" (determined by minimizing the Euclidean distance). For instance, in (slow) brute force form it would look like: element = numpy.array([1, 2, 3, 4, 5, 6, 7]) target = numpy.array(range(0, 49)).reshape(7,7)*0.1 min_length = 9999.0 min_index = for i in xrange(0, 7): distance = (element-target)**2 distance = numpy.sqrt(distance.sum()) if (distance < min_length): min_length = distance min_index = i Now of course, the actual problem will be of a much larger scale. I will have an array of elements, and a large number of potential targets. I was thinking of having element be an array where each element itself is a numpy.ndarray, and then vectorizing the code above so as an output I would have an array of the "min_index" and "min_length" values. I can get the following simple test to work so I may be on the right track: import numpy dtype = [("x", numpy.ndarray)] def single(data): return data[0].min() multiple = numpy.vectorize(single) if __name__ == "__main__": a = numpy.arange(0, 16).reshape(4,4) b = numpy.recarray((4), dtype=dtype) for i in xrange(0, b.shape[0]): b[i]["x"] = a[i,:] print a print b x = multiple(b) print x What is the best way of constructing "b" from "a"? I tried b = numpy.recarray((4), dtype=dtype, buf=a) but I get a segmentation fault when I try to print b. Is there a way to perform this larger task efficiently with record arrays and vectorization, or am I off on the wrong track completely? How can I do this efficiently without dropping down to Fortran? Thanks for any advice, Catherine From stefan at sun.ac.za Wed May 2 15:26:45 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 May 2012 12:26:45 -0700 Subject: [Numpy-discussion] record arrays and vectorizing In-Reply-To: References: Message-ID: On Wed, May 2, 2012 at 11:06 AM, Moroney, Catherine M (388D) wrote: > I will want to compare a 7-element vector (called "element") to a large list of similarly-dimensioned > vectors (called "target", and pick out the vector in "target" that is the closest to "element" > (determined by minimizing the Euclidean distance). It's not entirely clear what you mean from the description above. In the code example, you return a single index, but from the description it sounds like you want to pick out a vector? If you need multiple answers, one for each element, then you probably need to do broadcasting as shown in the NumPy medkit: http://mentat.za.net/numpy/numpy_advanced_slides/ St?fan From stefan at sun.ac.za Wed May 2 15:58:41 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 May 2012 12:58:41 -0700 Subject: [Numpy-discussion] copying array to itself In-Reply-To: <1335974584.8654.44.camel@farnsworth> References: <1335974584.8654.44.camel@farnsworth> Message-ID: On Wed, May 2, 2012 at 9:03 AM, Henry Gomersall wrote: > Is this some nuance of the way numpy does things? Or am I missing some > stupid bug in my code? Try playing with the parameters of the following code: sz = 10000 N = 10 import numpy as np x = np.arange(sz) y = x.copy() x[:-N] = x[N:] np.testing.assert_equal(x[:-N], y[N:]) For small values of sz this typically works, but as soon as numpy needs to buffer strange things happen because you are reading from memory locations that you've already written to. St?fan From francesc at continuum.io Wed May 2 16:53:47 2012 From: francesc at continuum.io (Francesc Alted) Date: Wed, 02 May 2012 15:53:47 -0500 Subject: [Numpy-discussion] sparse array data In-Reply-To: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> Message-ID: <4FA19EDB.5040702@continuum.io> On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote: > Hi all, > > I'm currently writing a code that needs three dimensional data (for the physicists it's dimensions are atom, ion, level). The problem is that not all combinations do exist (a sparse array). Sparse matrices in scipy only deal with two dimensions. The operations that I need to do on those are running functions like exp(item/constant) on all of the items. I also want to sum them up in the last dimension. What's the best way to make a class that takes this kind of data and does the required operations fast. Maybe some phycisists have implemented these things already. Any thoughts? Curiously enough, I have recently been discussing with Travis O. about how to represent sparse matrices with complete generality. One of the possibilities is to use what Travis call "synthetic dimensions". The idea behind it is easy: use a table with as many columns as dimensions, and add another one for the actual values of the array. For a 3-D sparse array, this looks like: dim0 | dim1 | dim2 | value ========================== 0 | 0 | 0 | val0 0 | 10 | 100 | val1 20 | 5 | 202 | val2 You can use any package that deals with tables for implementing such a thing. I'm going to quickly describe a raw implementation of this on top of carray [1], not only because I'm the author, but also because it adapts well to the needs you exposed. [1] https://github.com/FrancescAlted/carray Let's start with a small array with shape (2,5). We are going to use a dense array for this, mainly for comparison purposes with typical NumPy arrays, but of course the logic behind this can be extended to multidimensional sparse arrays with complete generality. In [1]: import carray as ca In [2]: import numpy as np In [3]: syn_dtype = np.dtype([('dim0', np.uint32), ('dim1', np.uint32), ('value', np.float64)]) In [4]: N = 10 In [6]: ct = ca.fromiter(((i/2, i%2, i*i) for i in xrange(N)), dtype=syn_dtype, count=N) In [7]: ct Out[7]: ctable((10,), |V16) nbytes: 160; cbytes: 12.00 KB; ratio: 0.01 cparams := cparams(clevel=5, shuffle=True) [(0, 0, 0.0) (0, 1, 1.0) (1, 0, 4.0) (1, 1, 9.0) (2, 0, 16.0) (2, 1, 25.0) (3, 0, 36.0) (3, 1, 49.0) (4, 0, 64.0) (4, 1, 81.0)] Okay, we have our small array. Now, let's apply a function for the values (in this case the log()): In [8]: ct['value'][:] = ct.eval('log(value)') In [9]: ct Out[9]: ctable((10,), |V16) nbytes: 160; cbytes: 12.00 KB; ratio: 0.01 cparams := cparams(clevel=5, shuffle=True) [(0, 0, -inf) (0, 1, 0.0) (1, 0, 1.3862943611198906) (1, 1, 2.1972245773362196) (2, 0, 2.772588722239781) (2, 1, 3.2188758248682006) (3, 0, 3.58351893845611) (3, 1, 3.8918202981106265) (4, 0, 4.1588830833596715) (4, 1, 4.394449154672439)] carray uses numexpr behind the scenes, so these operations are very fast. Also, for functions not supported inside numexpr, carray can also make use of the ones in NumPy (although these are typically not as efficient). Let's see how to do sums in different axis. For this, we will use the selection capabilities in the ctable object. Let's do the sum in the last axis first: In [10]: [ sum(row.value for row in ct.where('(dim0==%d)' % (i,))) for i in range(N/2) ] Out[10]: [-inf, 3.58351893845611, 5.991464547107982, 7.475339236566736, 8.55333223803211] So, it is just a matter of summing over dim1 while keeping dim0 fixed. One can check that the results are the same than for NumPy: In [11]: t = np.fromiter((np.log(i*i) for i in xrange(N)), dtype='f8').reshape(N/2,2) In [12]: t.sum(axis=1) Out[12]: array([ -inf, 3.58351894, 5.99146455, 7.47533924, 8.55333224]) Summing over the leading dimension means keeping dim1 fixed: In [13]: [ sum(row.value for row in ct.where('(dim1==%d)' % (i,))) for i in range(2) ] Out[13]: [-inf, 13.702369854987484] and again, this is the same than using the `axis=0` parameter: In [14]: t.sum(axis=0) Out[14]: array([ -inf, 13.70236985]) Summing everything is, as expected, the easiest: In [15]: sum(row.value for row in ct.iter()) Out[15]: -inf In [16]: t.sum() Out[16]: -inf Of course, the case for more dimensions requires a bit more complexity, but nothing fancy (this is left as an exercise for the reader ;). In case you are going to use this in your package, you may want to create wrappers that would access the different functionality more easily. Finally, you should note that I used 4-byte integers for representing the dimensions. If this is not enough, you can use 8-byte integers too. As the carray objects are compressed by default, this usually doesn't take a lot of space. For example, for an array with 1 million elements: In [31]: ct = ca.fromiter(((i/2, i%2, i*i) for i in xrange(N)), dtype=syn_dtype, count=N) In [32]: ct Out[32]: ctable((1000000,), |V16) nbytes: 15.26 MB; cbytes: 1.76 MB; ratio: 8.67 cparams := cparams(clevel=5, shuffle=True) [(0, 0, 0.0), (0, 1, 1.0), (1, 0, 4.0), ..., (499998, 1, 999994000009.0), (499999, 0, 999996000004.0), (499999, 1, 999998000001.0)] That is saying that the ctable object is requiring just 1.76 MB (compare this with the 8 MB that requires the equivalent dense NumPy array). One inconvenient of this approach is that it is generally much slower than using a dense array representation: In [30]: time [ sum(row.value for row in ct.where('(dim1==%d)' % (i,))) for i in range(2) ] CPU times: user 1.80 s, sys: 0.00 s, total: 1.81 s Wall time: 1.81 s Out[30]: [1.666661666665056e+17, 1.6666666666667674e+17] In [33]: t = np.fromiter((i*i for i in xrange(N)), dtype='f8').reshape(N/2,2) In [34]: time t.sum(axis=0) CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s Wall time: 0.01 s Out[34]: array([ 1.66666167e+17, 1.66666667e+17]) Probably, implementing a native sum() operation on top of ctable objects would help improving performance here. Alternatively, you could accelerate these operations by using the Table object in PyTables [2] and indexing the dimensions for getting much improved speed for accessing elements in big sparse arrays. Using a table in a relational database (indexed for dimensions) could be an option too. [2] https://github.com/PyTables/PyTables Hope this helps, -- Francesc Alted From david.froger at gmail.com Wed May 2 16:54:01 2012 From: david.froger at gmail.com (David Froger) Date: Wed, 02 May 2012 22:54:01 +0200 Subject: [Numpy-discussion] Continuous Integration In-Reply-To: References: <1335862622-sup-5857@david-desktop> Message-ID: <1335991483-sup-131@david-desktop> > I've been working on setting up a new buildbot for NumPy. Unfortunately, I > don't have much time to work on it, so it's slow going! Right now I am still at > the stage of getting NumPy to pass all its tests on the machines I'm using as > test slaves. After that, I plan to transfer existing slaves to the new setup, > and then maybe ask for new volunteer slave machines (if people think the > buildbot setup is useful). Hi, If there are things one can contribute to help the development of the buildbot for NumPy, I would be happy to participate! David From stefan at sun.ac.za Wed May 2 17:07:43 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 May 2012 14:07:43 -0700 Subject: [Numpy-discussion] sparse array data In-Reply-To: <4FA19EDB.5040702@continuum.io> References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> Message-ID: Hi Francesc On Wed, May 2, 2012 at 1:53 PM, Francesc Alted wrote: > and add another one for the actual values of the array. ?For a 3-D > sparse array, this looks like: > > dim0 | dim1 | dim2 | value > ========================== > ? ?0 | ? 0 ?| ? 0 ?| val0 > ? ?0 | ?10 ?| 100 ?| val1 > ? 20 | ? 5 ?| 202 ?| val2 What's the distinction between this and a coo_matrix? St?fan From njs at pobox.com Wed May 2 17:20:19 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 2 May 2012 22:20:19 +0100 Subject: [Numpy-discussion] sparse array data In-Reply-To: <4FA19EDB.5040702@continuum.io> References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> Message-ID: On Wed, May 2, 2012 at 9:53 PM, Francesc Alted wrote: > On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote: >> Hi all, >> >> I'm currently writing a code that needs three dimensional data (for the physicists it's dimensions are atom, ion, level). The problem is that not all combinations do exist (a sparse array). Sparse matrices in scipy only deal with two dimensions. The operations that I need to do on those are running functions like exp(item/constant) on all of the items. I also want to sum them up in the last dimension. What's the best way to make a class that takes this kind of data and does the required operations fast. Maybe some phycisists have implemented these things already. Any thoughts? > > Curiously enough, I have recently been discussing with Travis O. about > how to represent sparse matrices with complete generality. ?One of the > possibilities is to use what Travis call "synthetic dimensions". ?The > idea behind it is easy: use a table with as many columns as dimensions, > and add another one for the actual values of the array. ?For a 3-D > sparse array, this looks like: > > dim0 | dim1 | dim2 | value > ========================== > ? ?0 | ? 0 ?| ? 0 ?| val0 > ? ?0 | ?10 ?| 100 ?| val1 > ? 20 | ? 5 ?| 202 ?| val2 This coordinate format is also what's used by the MATLAB Tensor Toolbox. They have a paper justifying this choice and describing some tricks for how to work with them: http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1 (Spoiler: you use a lot of sort operations. Conveniently, timsort appears to be perfectly adapted for their algorithmic requirements.) I'm not sure why one would make up a new term like "synthetic dimensions" though, it's just standard coordinate format... Though, for the original poster, depending on their exact problem, they might be better off just using a list or object ndarray of scipy.sparse matrices. Or some coordinate arrays like above, plus add.reduceat for the sums they mentioned. -- Nathaniel From wesmckinn at gmail.com Wed May 2 17:25:08 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 2 May 2012 17:25:08 -0400 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Wed, May 2, 2012 at 9:48 AM, Charles R Harris wrote: > > > On Tue, May 1, 2012 at 11:47 PM, Ralf Gommers > wrote: >> >> >> >> On Wed, May 2, 2012 at 1:48 AM, Pauli Virtanen wrote: >>> >>> 01.05.2012 21:34, Ralf Gommers kirjoitti: >>> [clip] >>> > At this point it's probably good to look again at the problems we want >>> > to solve: >>> > 1. responsive user interface (must absolutely have) >>> >>> Now that it comes too late: with some luck, I've possibly hit on what >>> was ailing the Tracs (max_diff_bytes configured too large). Let's see if >>> things work better from now on... >> >> >> That's amazing - not only does it not give errors anymore, it's also an >> order of magnitude faster. >> > > So maybe we could just stick with trac. Performance was really the sticking > point. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > FWIW I'm pretty strongly in favor of GHI for NumPy/SciPy (I am going to get involved in NumPy dev eventually, promise). While warty in some of the places already mentioned, I have found it to be very low-friction and low-annoyance in my own dev process (nearing 1000 issues closed in the last year in pandas). But there are fewer cooks in the kitchen with pandas so perhaps this experience wouldn't be identical with NumPy. The biggest benefit I've seen is community involvement that you really wouldn't see if I were using a Trac or something else hosted elsewhere. Users are on GitHub and it for some reason gives people a feeling of engagement in the open source process that I don't see anywhere else. - Wes From Catherine.M.Moroney at jpl.nasa.gov Wed May 2 17:45:44 2012 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (388D)) Date: Wed, 2 May 2012 21:45:44 +0000 Subject: [Numpy-discussion] record arrays initialization Message-ID: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Thanks to Perry for some very useful off-list conversation. I realize that I wasn't being clear at all in my earlier description of the problem so here it is in a nutshell: Find the best match in an array t(5000, 7) for a single vector e(7). Now scale it up so e is (128, 512, 7) and I want to return a (128, 512) array of the t-identifiers that are the best match for e. "Best match" is defined as the minimum Euclidean distance. I'm going to try three ways: (a) brute force and lots of looping in python, (b) constructing a function to find the match for a single instance of e and vectorizing it, and (c) coding it in Fortran. I'll be curious to see the performance figures. Two smaller questions: A) How do I most efficiently construct a record array from a single array? I want to do the following, but it segfaults on me when i try to print b. vtype = [("x", numpy.ndarray)] a = numpy.arange(0, 16).reshape(4,4) b = numpy.recarray((4), dtype=vtype, buf=a) print a print b What is the most efficient way of constructing b from the values of a? In real-life, a is (128*512*7) and I want b to be (128, 512) with the x component being a 7-value numpy array. and B) If I'm vectorizing a function ("single") to find the best match for a single element of e within t, how do I pass the entire array t into the function without having it parcelled down to its individual elements? i.e. def single(elements, targets): nlen = element.shape[0] nvec = targets.data.shape[0] x = element.reshape(1, nlen).repeat(nvec, axis=0) diffs = ((x - targets.data)**2).sum(axis=1) diffs = numpy.sqrt(diffs) return numpy.argmin(diffs, axis=0) multiple = numpy.vectorize(single) x = multiple(all_elements, target) where all_elements is similar to "b" in my first example, and target is a 2-d array. The above code doesn't work because "target" gets reduced to a single element when it gets down to "single" and I need to see the whole array when I'm down in "single". I found a work-around by encapsulating target into a single object and passing in the object, but I'm curious if there's a better way of doing this. I hope I've explained myself better this time around, Catherine From aronne.merrelli at gmail.com Wed May 2 17:54:10 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 2 May 2012 16:54:10 -0500 Subject: [Numpy-discussion] record arrays and vectorizing In-Reply-To: References: Message-ID: On Wed, May 2, 2012 at 1:06 PM, Moroney, Catherine M (388D) wrote: > Hello, > > Can somebody give me some hints as to how to code up this function > in pure python, rather than dropping down to Fortran? > > I will want to compare a 7-element vector (called "element") to a large list of similarly-dimensioned > vectors (called "target", and pick out the vector in "target" that is the closest to "element" > (determined by minimizing the Euclidean distance). > > For instance, in (slow) brute force form it would look like: > > element = numpy.array([1, 2, 3, 4, 5, 6, 7]) > target ?= numpy.array(range(0, 49)).reshape(7,7)*0.1 > > min_length = 9999.0 > min_index ?= > for i in xrange(0, 7): > ? distance = (element-target)**2 > ? distance = numpy.sqrt(distance.sum()) > ? if (distance < min_length): > ? ? ?min_length = distance > ? ? ?min_index ?= i > If you are just trying to find the index to the vector in "target" that is closest to element, then I think the default broadcasting would work fine. Here is an example that should work (the broadcasting is done for the subtraction element-targets): In [39]: element = np.arange(1,8) In [40]: targets = np.random.uniform(0,8,(1000,7)) In [41]: distance_squared = ((element-targets)**2).sum(1) In [42]: min_index = distance_squared.argmin() In [43]: element Out[43]: array([1, 2, 3, 4, 5, 6, 7]) In [44]: targets[min_index,:] Out[44]: array([ 1.93625981, 2.56137284, 2.23395169, 4.15215253, 3.96478248, 5.21829915, 5.13049489]) Note - depending on the number of vectors in targets, it might be better to have everything transposed if you are really worried about the timing; you'd need to try that for your particular case. Hope that helps, Aronne From ceball at gmail.com Wed May 2 18:15:42 2012 From: ceball at gmail.com (Chris Ball) Date: Wed, 2 May 2012 22:15:42 +0000 (UTC) Subject: [Numpy-discussion] Continuous Integration References: <1335862622-sup-5857@david-desktop> <1335991483-sup-131@david-desktop> Message-ID: David Froger gmail.com> writes: > > I've been working on setting up a new buildbot for > > NumPy. Unfortunately, I don't have much time to work on it, > > so it's slow going! ... > Hi, > > If there are things one can contribute to help the development > of the buildbot for NumPy, I would be happy to participate! Great! I've sent you a message off the list so we can coordinate further. Chris From francesc at continuum.io Wed May 2 18:20:23 2012 From: francesc at continuum.io (Francesc Alted) Date: Wed, 02 May 2012 17:20:23 -0500 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> Message-ID: <4FA1B327.60406@continuum.io> On 5/2/12 4:07 PM, St?fan van der Walt wrote: > Hi Francesc > > On Wed, May 2, 2012 at 1:53 PM, Francesc Alted wrote: >> and add another one for the actual values of the array. For a 3-D >> sparse array, this looks like: >> >> dim0 | dim1 | dim2 | value >> ========================== >> 0 | 0 | 0 | val0 >> 0 | 10 | 100 | val1 >> 20 | 5 | 202 | val2 > What's the distinction between this and a coo_matrix? Well, as the OP said, coo_matrix does not support dimensions larger than 2, right? In [4]: coo_matrix((3,4), dtype=np.int8).todense() Out[4]: matrix([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], dtype=int8) In [5]: coo_matrix((2,3,2), dtype=np.int8).todense() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/faltet/ in () ----> 1 coo_matrix((2,3,2), dtype=np.int8).todense() /Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/scipy/sparse/coo.py in __init__(self, arg1, shape, dtype, copy) 127 obj, ij = arg1 128 except: --> 129 raise TypeError('invalid input format') 130 131 try: TypeError: invalid input format -- Francesc Alted From stefan at sun.ac.za Wed May 2 18:24:03 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 May 2012 15:24:03 -0700 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> References: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Message-ID: On Wed, May 2, 2012 at 2:45 PM, Moroney, Catherine M (388D) wrote: > Find the best match in an array t(5000, 7) for a single vector e(7). ?Now scale > it up so e is (128, 512, 7) and I want to return a (128, 512) array of the t-identifiers > that are the best match for e. ?"Best match" is defined as the minimum Euclidean distance. > > I'm going to try three ways: (a) brute force and lots of looping in python, > (b) constructing a function to find the match for a single instance of e and > vectorizing it, and (c) coding it in Fortran. ?I'll be curious to see the > performance figures. I'd use a mixture of (a) and (b): break the t(N, 7) up into blocks of, say, (1000, 7), compute the best match in each using broadcasting, and then combine your results to find the best of the best. This strategy should be best for very large N. For moderate N, where broadcasting easily fits into memory, the answer given by the OP to your original email would do the trick. > A) ?How do I most efficiently construct a record array from a single array? > I want to do the following, but it segfaults on me when i try to print b. > > vtype = [("x", numpy.ndarray)] > a = numpy.arange(0, 16).reshape(4,4) > b = numpy.recarray((4), dtype=vtype, buf=a) I prefer not to use record arrays, and stick to structured arrays: In [11]: vtype = np.dtype([('x', (np.float, 4))]) In [12]: a = np.arange(16.).reshape((4,4)) In [13]: a.view(vtype) Out[13]: array([[([0.0, 1.0, 2.0, 3.0],)], [([4.0, 5.0, 6.0, 7.0],)], [([8.0, 9.0, 10.0, 11.0],)], [([12.0, 13.0, 14.0, 15.0],)]], dtype=[('x', ' B) ?If I'm vectorizing a function ("single") to find the best match for > a single element of e within t, how do I pass the entire array t into > the function without having it parcelled down to its individual elements? I think the new dtype just makes your life more difficult here. Simply do: In [49]: np.sum(a - elements.T, axis=1) Out[49]: array([ 0., 16., 32., 48.]) St?fan From francesc at continuum.io Wed May 2 18:26:53 2012 From: francesc at continuum.io (Francesc Alted) Date: Wed, 02 May 2012 17:26:53 -0500 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> Message-ID: <4FA1B4AD.6060402@continuum.io> On 5/2/12 4:20 PM, Nathaniel Smith wrote: > On Wed, May 2, 2012 at 9:53 PM, Francesc Alted wrote: >> On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote: >>> Hi all, >>> >>> I'm currently writing a code that needs three dimensional data (for the physicists it's dimensions are atom, ion, level). The problem is that not all combinations do exist (a sparse array). Sparse matrices in scipy only deal with two dimensions. The operations that I need to do on those are running functions like exp(item/constant) on all of the items. I also want to sum them up in the last dimension. What's the best way to make a class that takes this kind of data and does the required operations fast. Maybe some phycisists have implemented these things already. Any thoughts? >> Curiously enough, I have recently been discussing with Travis O. about >> how to represent sparse matrices with complete generality. One of the >> possibilities is to use what Travis call "synthetic dimensions". The >> idea behind it is easy: use a table with as many columns as dimensions, >> and add another one for the actual values of the array. For a 3-D >> sparse array, this looks like: >> >> dim0 | dim1 | dim2 | value >> ========================== >> 0 | 0 | 0 | val0 >> 0 | 10 | 100 | val1 >> 20 | 5 | 202 | val2 > This coordinate format is also what's used by the MATLAB Tensor > Toolbox. They have a paper justifying this choice and describing some > tricks for how to work with them: > http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1 > (Spoiler: you use a lot of sort operations. Conveniently, timsort > appears to be perfectly adapted for their algorithmic requirements.) Uh, I do not have access to the article, but yes, sorting makes a lot of sense for these scenarios (this is why I was suggesting using indexes, which is sort of sorting too). > > I'm not sure why one would make up a new term like "synthetic > dimensions" though, it's just standard coordinate format... Yeah, this seems a bit weird. Perhaps Travis was referring to other things and I mixed concepts. Anyways. -- Francesc Alted From bioinformed at gmail.com Wed May 2 18:27:20 2012 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 2 May 2012 18:27:20 -0400 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> References: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Message-ID: On Wed, May 2, 2012 at 5:45 PM, Moroney, Catherine M (388D) < Catherine.M.Moroney at jpl.nasa.gov> wrote: > Thanks to Perry for some very useful off-list conversation. I realize > that > I wasn't being clear at all in my earlier description of the problem so > here it is > in a nutshell: > > Find the best match in an array t(5000, 7) for a single vector e(7). Now > scale > it up so e is (128, 512, 7) and I want to return a (128, 512) array of the > t-identifiers > that are the best match for e. "Best match" is defined as the minimum > Euclidean distance. > > It sounds like you want to find the nearest neighbor to a point in a high-dimensional space. This sounds like a job for a spacial data structure like a KD-tree. See: http://docs.scipy.org/doc/scipy/reference/spatial.html http://mloss.org/software/view/143/ http://www.mrzv.org/software/pyANN/ etc. -Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed May 2 18:28:19 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 May 2012 15:28:19 -0700 Subject: [Numpy-discussion] sparse array data In-Reply-To: <4FA1B327.60406@continuum.io> References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> <4FA1B327.60406@continuum.io> Message-ID: On Wed, May 2, 2012 at 3:20 PM, Francesc Alted wrote: > On 5/2/12 4:07 PM, St?fan van der Walt wrote: > Well, as the OP said, coo_matrix does not support dimensions larger than > 2, right? That's just an implementation detail, I would imagine--I'm trying to figure out if there is a new principle behind "synthetic dimensions"? By the way, David Cournapeau mentioned using b-trees for sparse ops a while ago; did you ever talk to him about those ideas? BTW, this coo-type storage is used in Stanford's probabilistic graphical models course, but for dense data (like we have in the course) it's a pain. Writing code in both Octave and Python, I again came to realize what a very elegant N-dimensional structure the numpy array exposes! St?fan From francesc at continuum.io Wed May 2 18:50:25 2012 From: francesc at continuum.io (Francesc Alted) Date: Wed, 02 May 2012 17:50:25 -0500 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> <4FA1B327.60406@continuum.io> Message-ID: <4FA1BA31.1080006@continuum.io> On 5/2/12 5:28 PM, St?fan van der Walt wrote: > On Wed, May 2, 2012 at 3:20 PM, Francesc Alted wrote: >> On 5/2/12 4:07 PM, St?fan van der Walt wrote: >> Well, as the OP said, coo_matrix does not support dimensions larger than >> 2, right? > That's just an implementation detail, I would imagine--I'm trying to > figure out if there is a new principle behind "synthetic dimensions"? No, no new things under the sun. We were just talking about many different things and this suddenly appeared in our talk. Nothing really serious. > By the way, David Cournapeau mentioned using b-trees for sparse ops a > while ago; did you ever talk to him about those ideas? Yup, the b-tree idea fits very well for indexing the coordinates. Although one problem with b-trees is that they do not compress well in general. -- Francesc Alted From ceball at gmail.com Wed May 2 18:59:58 2012 From: ceball at gmail.com (Chris Ball) Date: Wed, 2 May 2012 22:59:58 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Test_failures_-_which_dependencies_a?= =?utf-8?q?m_I=09missing=3F?= References: Message-ID: Chris Ball gmail.com> writes: > > Keith Hughitt gmail.com> writes: > > > Hi Chris, > > > > Try "sudo apt-get build-dep python-numpy" to install the dependencies for > > building NumPy. I believe it will install all of the optional dependencies > > as well. > > Thanks for that, but I'd already tried it and found the same failures. > > However, I also found that on my version of Ubuntu (10.04 LTS), which includes > NumPy 1.3.0, running "numpy.test(verbose=3)" yielded the following: (Above, "numpy" is the Ubuntu-supplied numpy in case that wasn't clear.) > nose.config: INFO: Excluding tests matching ['f2py_ext','f2py_f90_ext','gen_ext', > 'pyrex_ext', 'swig_ext', 'array_from_pyobj'] I've discovered that numpy itself explicitly excludes these tests (in numpy/testing/nosetester.py): # Stuff to exclude from tests. These are from numpy.distutils excludes = ['f2py_ext', 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext'] So, all is explained now. Chris From njs at pobox.com Wed May 2 19:12:29 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 May 2012 00:12:29 +0100 Subject: [Numpy-discussion] sparse array data In-Reply-To: <4FA1B4AD.6060402@continuum.io> References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> <4FA1B4AD.6060402@continuum.io> Message-ID: On Wed, May 2, 2012 at 11:26 PM, Francesc Alted wrote: > On 5/2/12 4:20 PM, Nathaniel Smith wrote: >> On Wed, May 2, 2012 at 9:53 PM, Francesc Alted ?wrote: >>> On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote: >>>> Hi all, >>>> >>>> I'm currently writing a code that needs three dimensional data (for the physicists it's dimensions are atom, ion, level). The problem is that not all combinations do exist (a sparse array). Sparse matrices in scipy only deal with two dimensions. The operations that I need to do on those are running functions like exp(item/constant) on all of the items. I also want to sum them up in the last dimension. What's the best way to make a class that takes this kind of data and does the required operations fast. Maybe some phycisists have implemented these things already. Any thoughts? >>> Curiously enough, I have recently been discussing with Travis O. about >>> how to represent sparse matrices with complete generality. ?One of the >>> possibilities is to use what Travis call "synthetic dimensions". ?The >>> idea behind it is easy: use a table with as many columns as dimensions, >>> and add another one for the actual values of the array. ?For a 3-D >>> sparse array, this looks like: >>> >>> dim0 | dim1 | dim2 | value >>> ========================== >>> ? ? 0 | ? 0 ?| ? 0 ?| val0 >>> ? ? 0 | ?10 ?| 100 ?| val1 >>> ? ?20 | ? 5 ?| 202 ?| val2 >> This coordinate format is also what's used by the MATLAB Tensor >> Toolbox. They have a paper justifying this choice and describing some >> tricks for how to work with them: >> ? ?http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1 >> (Spoiler: you use a lot of sort operations. Conveniently, timsort >> appears to be perfectly adapted for their algorithmic requirements.) > > Uh, I do not have access to the article, but yes, sorting makes a lot of > sense for these scenarios (this is why I was suggesting using indexes, > which is sort of sorting too). Doh, sorry. Sticking the title into google scholar gives me this: http://csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/SIAM-67648.pdf - N From aronne.merrelli at gmail.com Wed May 2 19:25:01 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 2 May 2012 18:25:01 -0500 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: References: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Message-ID: On Wed, May 2, 2012 at 5:27 PM, Kevin Jacobs wrote: > On Wed, May 2, 2012 at 5:45 PM, Moroney, Catherine M (388D) > wrote: >> >> Thanks to Perry for some very useful off-list conversation. ? I realize >> that >> I wasn't being clear at all in my earlier description of the problem so >> here it is >> in a nutshell: >> >> Find the best match in an array t(5000, 7) for a single vector e(7). ?Now >> scale >> it up so e is (128, 512, 7) and I want to return a (128, 512) array of the >> t-identifiers >> that are the best match for e. ?"Best match" is defined as the minimum >> Euclidean distance. >> > > It sounds like you want to find the nearest neighbor to a point in a > high-dimensional space. This sounds like a job for a spacial data structure > like a KD-tree. ?See: > http://docs.scipy.org/doc/scipy/reference/spatial.html > http://mloss.org/software/view/143/ > http://www.mrzv.org/software/pyANN/ > etc. In general this is a good suggestion - I was going to mention it earlier - but I think for this particular problem it is not better than the "brute force" and argmin() NumPy approach. On my laptop, the KDTree query is about a factor of 7 slower (ignoring the time cost to create the KDTree) In [45]: def foo1(element, targets): distance_squared = ((element-targets)**2).sum(1) min_index = distance_squared.argmin() return sqrt(distance_squared[min_index]), min_index ....: In [46]: def foo2(element, T): return T.query(element) In [47]: element = np.arange(1,8) In [48]: targets = np.random.uniform(0,8,(5000,7)) In [49]: T = scipy.spatial.KDTree(targets) In [50]: %timeit foo1(element, targets) 1000 loops, best of 3: 401 us per loop In [51]: %timeit foo2(element, T) 100 loops, best of 3: 2.92 ms per loop Just to make sure they say the same thing: In [53]: foo1(element, targets) Out[53]: (1.8173671152898632, 570) In [54]: foo2(element, T) Out[54]: (1.8173671152898632, 570) I think KDTree is more optimal for larger searches (more than 5000 elements), and fewer dimensions. For example, with 500,000 elements and 2 dimensions, I get 34 ms for NumPy and 2 ms for the KDtree. Back to the original question, for 400 us per search, even over 128x512 elements that would be 26 seconds total, I think? That might not be too bad. Cheers, Aronne From Catherine.M.Moroney at jpl.nasa.gov Wed May 2 19:26:49 2012 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (388D)) Date: Wed, 2 May 2012 23:26:49 +0000 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: References: Message-ID: <4D2735B1-EE4F-46FB-B86D-59B4FCC09FD2@jpl.nasa.gov> On May 2, 2012, at 3:23 PM, wrote: >> A) ?How do I most efficiently construct a record array from a single array? >> I want to do the following, but it segfaults on me when i try to print b. >> >> vtype = [("x", numpy.ndarray)] >> a = numpy.arange(0, 16).reshape(4,4) >> b = numpy.recarray((4), dtype=vtype, buf=a) > > I prefer not to use record arrays, and stick to structured arrays: > > In [11]: vtype = np.dtype([('x', (np.float, 4))]) > > In [12]: a = np.arange(16.).reshape((4,4)) > > In [13]: a.view(vtype) > Out[13]: > array([[([0.0, 1.0, 2.0, 3.0],)], > [([4.0, 5.0, 6.0, 7.0],)], > [([8.0, 9.0, 10.0, 11.0],)], > [([12.0, 13.0, 14.0, 15.0],)]], > dtype=[('x', ') Date: Wed, 2 May 2012 19:46:21 -0400 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: References: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Message-ID: On Wed, May 2, 2012 at 7:25 PM, Aronne Merrelli wrote: > In general this is a good suggestion - I was going to mention it > earlier - but I think for this particular problem it is not better > than the "brute force" and argmin() NumPy approach. On my laptop, the > KDTree query is about a factor of 7 slower (ignoring the time cost to > create the KDTree) > > The cKDTree implementation is more than 4 times faster than the brute-force approach: T = scipy.spatial.cKDTree(targets) In [11]: %timeit foo1(element, targets) # Brute force 1000 loops, best of 3: 385 us per loop In [12]: %timeit foo2(element, T) # cKDTree 10000 loops, best of 3: 83.5 us per loop In [13]: 385/83.5 Out[13]: 4.610778443113772 A FLANN implementation should be even faster--perhaps by as much as another factor of two. -Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed May 2 21:25:02 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 2 May 2012 20:25:02 -0500 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> <4FA1B327.60406@continuum.io> Message-ID: On May 2, 2012, at 5:28 PM, St?fan van der Walt wrote: > On Wed, May 2, 2012 at 3:20 PM, Francesc Alted wrote: >> On 5/2/12 4:07 PM, St?fan van der Walt wrote: >> Well, as the OP said, coo_matrix does not support dimensions larger than >> 2, right? > > That's just an implementation detail, I would imagine--I'm trying to > figure out if there is a new principle behind "synthetic dimensions"? > By the way, David Cournapeau mentioned using b-trees for sparse ops a > while ago; did you ever talk to him about those ideas? The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions. Thus, you could create a "view" of a NumPy array (or a group of NumPy arrays) where 1 or more dimensions is replaced with these "sparse dimensions". This is a fully-general way to handle a mixture of sparse and dense structures in one general array interface. However, you lose the O(1) lookup as now you must search for the non-zero items in order to implement algorithms (indexes are critical and Francesc has some nice indexes in PyTables). A group-by operation can be replaced by an operation on "a sparse dimension" where you have mapped attributes to 1 or more dimensions in the underlying array. coo_matrix is just a special case of this more general idea. If you add the ability to compress attributes, then you get csr, csc, and various other forms of matrices as well. More to come.... If you are interested in this sort of thing please let me know.... -Travis From stefan at sun.ac.za Wed May 2 22:03:40 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 May 2012 19:03:40 -0700 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: <4D2735B1-EE4F-46FB-B86D-59B4FCC09FD2@jpl.nasa.gov> References: <4D2735B1-EE4F-46FB-B86D-59B4FCC09FD2@jpl.nasa.gov> Message-ID: On Wed, May 2, 2012 at 4:26 PM, Moroney, Catherine M (388D) wrote: > Using structured arrays is making my code complex when I try to call the > vectorized function. ?If I stick to the original record arrays, what's the > best way of initializing b from a without doing an row-by-row copy? What does your original data look like? It seems like `a` is already what you need after the reshape? St?fan From stefan at sun.ac.za Wed May 2 22:06:24 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 May 2012 19:06:24 -0700 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: References: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Message-ID: On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs wrote: > A FLANN implementation should be even faster--perhaps by as much as another > factor of two. I guess it depends on whether you care about the "Approximate" in "Fast Library for Approximate Nearest Neighbors". St?fan From ben.root at ou.edu Wed May 2 22:21:53 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 2 May 2012 22:21:53 -0400 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: References: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Message-ID: On Wednesday, May 2, 2012, St?fan van der Walt wrote: > On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs > > > > wrote: > > A FLANN implementation should be even faster--perhaps by as much as > another > > factor of two. > > I guess it depends on whether you care about the "Approximate" in > "Fast Library for Approximate Nearest Neighbors". > > St?fan This is why I love following these lists! I don't think I ever would have come across this method on my own. Nifty! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From aronne.merrelli at gmail.com Wed May 2 23:03:15 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 2 May 2012 22:03:15 -0500 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: References: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Message-ID: On Wed, May 2, 2012 at 6:46 PM, Kevin Jacobs wrote: > On Wed, May 2, 2012 at 7:25 PM, Aronne Merrelli > wrote: >> >> In general this is a good suggestion - I was going to mention it >> earlier - but I think for this particular problem it is not better >> than the "brute force" and argmin() NumPy approach. On my laptop, the >> KDTree query is about a factor of 7 slower (ignoring the time cost to >> create the KDTree) >> > > The cKDTree implementation is more than 4 times faster than the brute-force > approach: > > T = scipy.spatial.cKDTree(targets) > > In [11]: %timeit foo1(element, targets) ? # Brute force > 1000 loops, best of 3: 385 us per loop > > In [12]: %timeit foo2(element, T) ? ? ? ? # cKDTree > 10000 loops, best of 3: 83.5 us per loop > > In [13]: 385/83.5 > Out[13]: 4.610778443113772 Wow, not sure how I missed that! It even seems to scale better than linear (some of that 83us is call overhead, I guess): In [35]: %timeit foo2(element, T) 10000 loops, best of 3: 115 us per loop In [36]: elements = np.random.uniform(0,8,[128,512,7]) In [37]: %timeit foo2(elements.reshape((128*512,7)), T) 1 loops, best of 3: 2.66 s per loop So only 2.7 seconds to search the whole set. Not bad! Cheers, Aronne From stefan at sun.ac.za Wed May 2 23:03:20 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 2 May 2012 20:03:20 -0700 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> <4FA1B327.60406@continuum.io> Message-ID: On Wed, May 2, 2012 at 6:25 PM, Travis Oliphant wrote: > The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions. Ah, thanks--that's the detail I was missing. I wonder if the contiguity requirement will hamper us here, though. E.g., I could imagine that some tree structure might be more suitable to storing and organizing indices, and for large arrays we wouldn't like to make a copy for each operation. I guess we can't wait for discontiguous arrays to come along, though :) > More to come.... ?If you are interested in this sort of thing please let me know.... Definitely--if we can optimize this machinery it will be beneficial to scipy.sparse as well. St?fan From charlesr.harris at gmail.com Wed May 2 23:44:35 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 2 May 2012 21:44:35 -0600 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> Message-ID: On Wed, May 2, 2012 at 3:20 PM, Nathaniel Smith wrote: > On Wed, May 2, 2012 at 9:53 PM, Francesc Alted > wrote: > > On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote: > >> Hi all, > >> > >> I'm currently writing a code that needs three dimensional data (for the > physicists it's dimensions are atom, ion, level). The problem is that not > all combinations do exist (a sparse array). Sparse matrices in scipy only > deal with two dimensions. The operations that I need to do on those are > running functions like exp(item/constant) on all of the items. I also want > to sum them up in the last dimension. What's the best way to make a class > that takes this kind of data and does the required operations fast. Maybe > some phycisists have implemented these things already. Any thoughts? > > > > Curiously enough, I have recently been discussing with Travis O. about > > how to represent sparse matrices with complete generality. One of the > > possibilities is to use what Travis call "synthetic dimensions". The > > idea behind it is easy: use a table with as many columns as dimensions, > > and add another one for the actual values of the array. For a 3-D > > sparse array, this looks like: > > > > dim0 | dim1 | dim2 | value > > ========================== > > 0 | 0 | 0 | val0 > > 0 | 10 | 100 | val1 > > 20 | 5 | 202 | val2 > > This coordinate format is also what's used by the MATLAB Tensor > Toolbox. They have a paper justifying this choice and describing some > tricks for how to work with them: > http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1 > (Spoiler: you use a lot of sort operations. Conveniently, timsort > appears to be perfectly adapted for their algorithmic requirements.) > > Timsort probably isn't the best choice here, it is optimized for python lists of python objects where there is at least one level of indirection and compares are expensive, even more expensive for compound objects. If the coordinates are stored in numpy structured arrays an ordinary sort is likely to be faster. Lexsort might even be faster as it could access aligned integer data and not have to move lists of indexes around. I'm not sure why one would make up a new term like "synthetic > dimensions" though, it's just standard coordinate format... > > Though, for the original poster, depending on their exact problem, > they might be better off just using a list or object ndarray of > scipy.sparse matrices. Or some coordinate arrays like above, plus > add.reduceat for the sums they mentioned. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu May 3 00:27:03 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 2 May 2012 23:27:03 -0500 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> <4FA1B327.60406@continuum.io> Message-ID: <2AAEE585-4084-48AB-9836-101FB84B9BD9@continuum.io> On May 2, 2012, at 10:03 PM, St?fan van der Walt wrote: > On Wed, May 2, 2012 at 6:25 PM, Travis Oliphant wrote: >> The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions. > > Ah, thanks--that's the detail I was missing. I wonder if the > contiguity requirement will hamper us here, though. E.g., I could > imagine that some tree structure might be more suitable to storing and > organizing indices, and for large arrays we wouldn't like to make a > copy for each operation. I guess we can't wait for discontiguous > arrays to come along, though :) Actually, it's better to keep the actual data together as much as possible, I think, and simulate a tree structure with a layer on top --- i.e. an index. Different algorithms will prefer different orderings of the underlying data just as today different algorithms prefer different striding patterns on the standard, strided view of a dense array. -Travis > >> More to come.... If you are interested in this sort of thing please let me know.... > > Definitely--if we can optimize this machinery it will be beneficial to > scipy.sparse as well. > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Thu May 3 04:23:45 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 03 May 2012 10:23:45 +0200 Subject: [Numpy-discussion] sparse array data In-Reply-To: <2AAEE585-4084-48AB-9836-101FB84B9BD9@continuum.io> References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> <4FA1B327.60406@continuum.io> <2AAEE585-4084-48AB-9836-101FB84B9BD9@continuum.io> Message-ID: <4FA24091.2090307@astro.uio.no> On 05/03/2012 06:27 AM, Travis Oliphant wrote: > > On May 2, 2012, at 10:03 PM, St?fan van der Walt wrote: > >> On Wed, May 2, 2012 at 6:25 PM, Travis Oliphant wrote: >>> The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions. >> >> Ah, thanks--that's the detail I was missing. I wonder if the >> contiguity requirement will hamper us here, though. E.g., I could >> imagine that some tree structure might be more suitable to storing and >> organizing indices, and for large arrays we wouldn't like to make a >> copy for each operation. I guess we can't wait for discontiguous >> arrays to come along, though :) > > Actually, it's better to keep the actual data together as much as possible, I think, and simulate a tree structure with a layer on top --- i.e. an index. > > Different algorithms will prefer different orderings of the underlying data just as today different algorithms prefer different striding patterns on the standard, strided view of a dense array. Two examples: 1) If you know the distribution of your indices in N dimensions ahead of time (e.g., roughly evenly particles in 3D space), you better fill that volume using a fractal, and use indices along that fractal, so that you get at least some spatial locality in all N dimensions. (This is standard procedure in parallelization codes) 2) For standard dense linear algebra code, it is better to store the data blocked than either contiguous ordering -- to the point that LAPACK/BLAS repacks to a blocked ordering internally on each operation. Dag From rhattersley at gmail.com Thu May 3 04:39:26 2012 From: rhattersley at gmail.com (Richard Hattersley) Date: Thu, 3 May 2012 09:39:26 +0100 Subject: [Numpy-discussion] record arrays and vectorizing In-Reply-To: References: Message-ID: Sounds like it could be a good match for `scipy.spatial.cKDTree`. It can handle single-element queries... >>> element = numpy.arange(1, 8) >>> targets = numpy.random.uniform(0, 8, (1000, 7)) >>> tree = scipy.spatial.cKDTree(targets) >>> distance, index = tree.query(element) >>> targets[index] array([ 1.68457267, 4.26370212, 3.14837617, 4.67616512, 5.80572286, 6.46823904, 6.12957534]) Or even multi-element queries (shown here searching for 3 elements in one call)... >>> elements = numpy.linspace(1, 8, 21).reshape((3, 7)) >>> elements array([[ 1. , 1.35, 1.7 , 2.05, 2.4 , 2.75, 3.1 ], [ 3.45, 3.8 , 4.15, 4.5 , 4.85, 5.2 , 5.55], [ 5.9 , 6.25, 6.6 , 6.95, 7.3 , 7.65, 8. ]]) >>> distances, indices = tree.query(element) >>> targets[indices] array([[ 0.24314961, 2.77933521, 2.00092505, 3.25180563, 2.05392726, 2.80559459, 4.43030939], [ 4.19270199, 2.89257994, 3.91366449, 3.29262138, 3.6779851 , 4.06619636, 4.7183393 ], [ 6.58055518, 6.59232922, 7.00473346, 5.22612494, 7.07170015, 6.54570121, 7.59566404]]) Richard Hattersley On 2 May 2012 19:06, Moroney, Catherine M (388D) < Catherine.M.Moroney at jpl.nasa.gov> wrote: > Hello, > > Can somebody give me some hints as to how to code up this function > in pure python, rather than dropping down to Fortran? > > I will want to compare a 7-element vector (called "element") to a large > list of similarly-dimensioned > vectors (called "target", and pick out the vector in "target" that is the > closest to "element" > (determined by minimizing the Euclidean distance). > > For instance, in (slow) brute force form it would look like: > > element = numpy.array([1, 2, 3, 4, 5, 6, 7]) > target = numpy.array(range(0, 49)).reshape(7,7)*0.1 > > min_length = 9999.0 > min_index = > for i in xrange(0, 7): > distance = (element-target)**2 > distance = numpy.sqrt(distance.sum()) > if (distance < min_length): > min_length = distance > min_index = i > > Now of course, the actual problem will be of a much larger scale. I will > have > an array of elements, and a large number of potential targets. > > I was thinking of having element be an array where each element itself is > a numpy.ndarray, and then vectorizing the code above so as an output I > would > have an array of the "min_index" and "min_length" values. > > I can get the following simple test to work so I may be on the right track: > > import numpy > > dtype = [("x", numpy.ndarray)] > > def single(data): > return data[0].min() > > multiple = numpy.vectorize(single) > > if __name__ == "__main__": > > a = numpy.arange(0, 16).reshape(4,4) > b = numpy.recarray((4), dtype=dtype) > for i in xrange(0, b.shape[0]): > b[i]["x"] = a[i,:] > > print a > print b > > x = multiple(b) > print x > > What is the best way of constructing "b" from "a"? I tried b = > numpy.recarray((4), dtype=dtype, buf=a) > but I get a segmentation fault when I try to print b. > > Is there a way to perform this larger task efficiently with record arrays > and vectorization, or > am I off on the wrong track completely? How can I do this efficiently > without dropping > down to Fortran? > > Thanks for any advice, > > Catherine > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu May 3 04:39:35 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 03 May 2012 10:39:35 +0200 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> <4FA1B327.60406@continuum.io> Message-ID: <4FA24447.2090304@astro.uio.no> On 05/03/2012 03:25 AM, Travis Oliphant wrote: > > On May 2, 2012, at 5:28 PM, St?fan van der Walt wrote: > >> On Wed, May 2, 2012 at 3:20 PM, Francesc Alted wrote: >>> On 5/2/12 4:07 PM, St?fan van der Walt wrote: >>> Well, as the OP said, coo_matrix does not support dimensions larger than >>> 2, right? >> >> That's just an implementation detail, I would imagine--I'm trying to >> figure out if there is a new principle behind "synthetic dimensions"? >> By the way, David Cournapeau mentioned using b-trees for sparse ops a >> while ago; did you ever talk to him about those ideas? > > The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions. Thus, you could create a "view" of a NumPy array (or a group of NumPy arrays) where 1 or more dimensions is replaced with these "sparse dimensions". This is a fully-general way to handle a mixture of sparse and dense structures in one general array interface. > > However, you lose the O(1) lookup as now you must search for the non-zero items in order to implement algorithms (indexes are critical and Francesc has some nice indexes in PyTables). > > A group-by operation can be replaced by an operation on "a sparse dimension" where you have mapped attributes to 1 or more dimensions in the underlying array. > > coo_matrix is just a special case of this more general idea. If you add the ability to compress attributes, then you get csr, csc, and various other forms of matrices as well. > > More to come.... If you are interested in this sort of thing please let me know.... I am very interested in such developments, since a NumPy axis very often doesn't mean anything to itself to me, but I'll do stuff like arr[l*l + l + m - lmin * lmin] = f(l, m) Something like a function you could register that maps virtual dimensions to actual dimensions would be great: void lm_to_idx(Py_ssize_t *virtual_indices, Py_ssize_t *indices, void *ctx) { Py_ssize_t l, m, lmin; lmin = ((my_context_t*)ctx)->lmin; for (Py_ssize_t i = 0; i != NPY_INDEX_TRANSLATION_BLOCKSIZE; ++i) { l = virtual_indices[2 * i]; m = virtual_indiecs[2 * i + 1]; indices[i] = l * l + l + m - lmin * lmin; } } Then register the function with an array (somehow), and then you can do arr[l, m] = f(l, m) :-) But, doing it manually isn't *that* much of a pain, I won't be putting in any work in this direction myself. Dag From heng at cantab.net Thu May 3 04:51:34 2012 From: heng at cantab.net (Henry Gomersall) Date: Thu, 03 May 2012 09:51:34 +0100 Subject: [Numpy-discussion] copying array to itself In-Reply-To: References: <1335974584.8654.44.camel@farnsworth> Message-ID: <1336035094.2430.0.camel@farnsworth> On Wed, 2012-05-02 at 12:58 -0700, St?fan van der Walt wrote: > On Wed, May 2, 2012 at 9:03 AM, Henry Gomersall > wrote: > > Is this some nuance of the way numpy does things? Or am I missing > some > > stupid bug in my code? > > Try playing with the parameters of the following code: > > > For small values of sz this typically works, but as soon as numpy > needs to buffer strange things happen because you are reading from > memory locations that you've already written to. Right, so this is expected behaviour then. Is this documented somewhere? It strikes me that this is pretty unexpected behaviour. Henry From njs at pobox.com Thu May 3 05:41:28 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 May 2012 10:41:28 +0100 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> Message-ID: On Thu, May 3, 2012 at 4:44 AM, Charles R Harris wrote: > > > On Wed, May 2, 2012 at 3:20 PM, Nathaniel Smith wrote: >> This coordinate format is also what's used by the MATLAB Tensor >> Toolbox. They have a paper justifying this choice and describing some >> tricks for how to work with them: >> ?http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1 >> (Spoiler: you use a lot of sort operations. Conveniently, timsort >> appears to be perfectly adapted for their algorithmic requirements.) >> > > Timsort probably isn't the best choice here, it is optimized for python > lists of python objects where there is at least one level of indirection and > compares are expensive, even more expensive for compound objects. If the > coordinates are stored in numpy structured arrays an ordinary sort is likely > to be faster. Lexsort might even be faster as it could access aligned > integer data and not have to move lists of indexes around. To be clear, I don't mean Timsort-the-implementation, I mean Timsort-the-algorithm (which is now also the default sorting algorithm in Java). That said, it may well be optimized for expensive compares and something like a radix sort would be even better. In these sparse tensor algorithms, we often need to sort by one coordinate axis, and then sort by another (a "corner turn"). The reason Timsort seems appealing is that (1) it goes faster than O(n log n) when there is existing structure in the data being sorted, (2) because it's a stable sort, sorting on one axis then sorting on another will leave lots of that structure behind to be exploited later. So one can expect it to hit its happy case relatively often. -- Nathaniel From mlist at re-factory.de Thu May 3 09:24:37 2012 From: mlist at re-factory.de (Robert Elsner) Date: Thu, 03 May 2012 15:24:37 +0200 Subject: [Numpy-discussion] Status of np.bincount Message-ID: <4FA28715.8060403@re-factory.de> Hello Everybody, is there any news on the status of np.bincount with respect to "big" numbers? It seems I have just been bitten by #225. Is there an efficient way around? I found the np.histogram function painfully slow. Below a simple script, that demonstrates bincount failing with a memory error on big numbers import numpy as np x = np.array((30e9,)).astype(int) np.bincount(x) Any good idea how to work around it. My arrays contain somewhat 50M entries in the range from 0 to 30e9. And I would like to have them bincounted... cheers Robert From charlesr.harris at gmail.com Thu May 3 09:39:45 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 3 May 2012 07:39:45 -0600 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> Message-ID: On Thu, May 3, 2012 at 3:41 AM, Nathaniel Smith wrote: > On Thu, May 3, 2012 at 4:44 AM, Charles R Harris > wrote: > > > > > > On Wed, May 2, 2012 at 3:20 PM, Nathaniel Smith wrote: > >> This coordinate format is also what's used by the MATLAB Tensor > >> Toolbox. They have a paper justifying this choice and describing some > >> tricks for how to work with them: > >> http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1 > >> (Spoiler: you use a lot of sort operations. Conveniently, timsort > >> appears to be perfectly adapted for their algorithmic requirements.) > >> > > > > Timsort probably isn't the best choice here, it is optimized for python > > lists of python objects where there is at least one level of indirection > and > > compares are expensive, even more expensive for compound objects. If the > > coordinates are stored in numpy structured arrays an ordinary sort is > likely > > to be faster. Lexsort might even be faster as it could access aligned > > integer data and not have to move lists of indexes around. > > To be clear, I don't mean Timsort-the-implementation, I mean > Timsort-the-algorithm (which is now also the default sorting algorithm > in Java). That said, it may well be optimized for expensive compares > and something like a radix sort would be even better. > > Java uses Timsort for sorting object arrays (pointers) and dual pivot quicksort for sorting arrays of native types, ints and such. Timsort is very good for almost sorted arrays and the mixed algorithms are becoming popular, i.e., introsort and recent updates to the dual pivot sort that also look for runs. One of the reasons compares can be expensive for arrays of pointers to objects is that the objects can be located all over memory, which blows the cache. There are also a few mods to the numpy quicksort that might speed things up a bit more for common cases where there are a lot of repeated elements. In these sparse tensor algorithms, we often need to sort by one > coordinate axis, and then sort by another (a "corner turn"). The > reason Timsort seems appealing is that (1) it goes faster than O(n log > n) when there is existing structure in the data being sorted, (2) > because it's a stable sort, sorting on one axis then sorting on > another will leave lots of that structure behind to be exploited > later. So one can expect it to hit its happy case relatively often. > > Yes, that's why we have mergesort. An optimistic version making some use of runs might make it faster. We do have object arrays and no type specialized sort for them, so bringing Timsort in for those could be useful. Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu May 3 09:45:44 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 3 May 2012 14:45:44 +0100 Subject: [Numpy-discussion] Status of np.bincount In-Reply-To: <4FA28715.8060403@re-factory.de> References: <4FA28715.8060403@re-factory.de> Message-ID: On Thu, May 3, 2012 at 2:24 PM, Robert Elsner wrote: > Hello Everybody, > > is there any news on the status of np.bincount with respect to "big" > numbers? It seems I have just been bitten by #225. Is there an efficient > way around? I found the np.histogram function painfully slow. > > Below a simple script, that demonstrates bincount failing with a memory > error on big numbers > > import numpy as np > > x = np.array((30e9,)).astype(int) > np.bincount(x) > > > Any good idea how to work around it. My arrays contain somewhat 50M > entries in the range from 0 to 30e9. And I would like to have them > bincounted... You need a sparse data structure, then. Are you sure you even have duplicates? Anyways, I won't work out all of the details, but let me sketch something that might get you your answers. First, sort your array. Then use np.not_equal(x[:-1], x[1:]) as a mask on np.arange(1,len(x)) to find the indices where each sorted value changes over to the next. The np.diff() of that should give you the size of each. Use np.unique to get the sorted unique values to match up with those sizes. Fixing all of the off-by-one errors and dealing with the boundary conditions correctly is left as an exercise for the reader. -- Robert Kern From mlist at re-factory.de Thu May 3 09:50:30 2012 From: mlist at re-factory.de (Robert Elsner) Date: Thu, 03 May 2012 15:50:30 +0200 Subject: [Numpy-discussion] Status of np.bincount In-Reply-To: References: <4FA28715.8060403@re-factory.de> Message-ID: <4FA28D26.1050406@re-factory.de> Am 03.05.2012 15:45, schrieb Robert Kern: > On Thu, May 3, 2012 at 2:24 PM, Robert Elsner wrote: >> Hello Everybody, >> >> is there any news on the status of np.bincount with respect to "big" >> numbers? It seems I have just been bitten by #225. Is there an efficient >> way around? I found the np.histogram function painfully slow. >> >> Below a simple script, that demonstrates bincount failing with a memory >> error on big numbers >> >> import numpy as np >> >> x = np.array((30e9,)).astype(int) >> np.bincount(x) >> >> >> Any good idea how to work around it. My arrays contain somewhat 50M >> entries in the range from 0 to 30e9. And I would like to have them >> bincounted... > > You need a sparse data structure, then. Are you sure you even have duplicates? > > Anyways, I won't work out all of the details, but let me sketch > something that might get you your answers. First, sort your array. > Then use np.not_equal(x[:-1], x[1:]) as a mask on np.arange(1,len(x)) > to find the indices where each sorted value changes over to the next. > The np.diff() of that should give you the size of each. Use np.unique > to get the sorted unique values to match up with those sizes. > > Fixing all of the off-by-one errors and dealing with the boundary > conditions correctly is left as an exercise for the reader. > ?? I suspect that this mail was meant to end up in the thread about sparse array data? From robert.kern at gmail.com Thu May 3 09:57:30 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 3 May 2012 14:57:30 +0100 Subject: [Numpy-discussion] Status of np.bincount In-Reply-To: <4FA28D26.1050406@re-factory.de> References: <4FA28715.8060403@re-factory.de> <4FA28D26.1050406@re-factory.de> Message-ID: On Thu, May 3, 2012 at 2:50 PM, Robert Elsner wrote: > > Am 03.05.2012 15:45, schrieb Robert Kern: >> On Thu, May 3, 2012 at 2:24 PM, Robert Elsner wrote: >>> Hello Everybody, >>> >>> is there any news on the status of np.bincount with respect to "big" >>> numbers? It seems I have just been bitten by #225. Is there an efficient >>> way around? I found the np.histogram function painfully slow. >>> >>> Below a simple script, that demonstrates bincount failing with a memory >>> error on big numbers >>> >>> import numpy as np >>> >>> x = np.array((30e9,)).astype(int) >>> np.bincount(x) >>> >>> >>> Any good idea how to work around it. My arrays contain somewhat 50M >>> entries in the range from 0 to 30e9. And I would like to have them >>> bincounted... >> >> You need a sparse data structure, then. Are you sure you even have duplicates? >> >> Anyways, I won't work out all of the details, but let me sketch >> something that might get you your answers. First, sort your array. >> Then use np.not_equal(x[:-1], x[1:]) as a mask on np.arange(1,len(x)) >> to find the indices where each sorted value changes over to the next. >> The np.diff() of that should give you the size of each. Use np.unique >> to get the sorted unique values to match up with those sizes. >> >> Fixing all of the off-by-one errors and dealing with the boundary >> conditions correctly is left as an exercise for the reader. >> > > ?? I suspect that this mail was meant to end up in the thread about > sparse array data? No, I am responding to you. -- Robert Kern From otrov at hush.ai Thu May 3 12:44:22 2012 From: otrov at hush.ai (Kliment) Date: Thu, 03 May 2012 18:44:22 +0200 Subject: [Numpy-discussion] Fwd: ... numpy/linalg/lapack_lite.so: undefined symbol: zungqr_ Message-ID: <20120503164422.4978C6F448@smtp.hushmail.com> Hi, I compiled lapack, atlas, umfpack, fftw in local folder, in similar way as described here: http://www.scipy.org/Installing_SciPy/Linux on 32bit Ubuntu Precise In ~/.local/lib I have: ======================================== libamd.2.2.3.a libamd.a -> libamd.2.2.3.a libatlas.a libcblas.a libf77blas.a libfftw3.a libfftw3.la* liblapack.a librefblas.a libsatlas.so* libtmglib.a libumfpack.5.5.2.a libumfpack.a -> libumfpack.5.5.2.a ======================================== In ~/.local/include I have: ======================================== amd.h atlas/ cblas.h clapack.h fftw3.f fftw3.f03 fftw3.h fftw3l.f03 fftw3q.f03 UFconfig.h umfpack.h ======================================== My site.cfg looks like this: ======================================== [DEFAULT] library_dirs = $HOME/.local/lib include_dirs = $HOME/.local/include [atlas] atlas_libs = lapack, f77blas, cblas, atlas [amd] amd_libs = amd [umfpack] umfpack_libs = umfpack, gfortran [fftw] libraries = fftw3 ======================================== I extracted numpy and run: python setup.py build --fcompiler=gnu95 python setup.py install --prefix=$HOME/.local I then run python interpreter and try to import numpy, when I receive import error: ImportError: /home/vlad/.local/lib/python2.7/site- packages/numpy/linalg/lapack_lite.so: undefined symbol: zungqr_ What did I do wrong? From tsyu80 at gmail.com Thu May 3 12:51:30 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Thu, 3 May 2012 12:51:30 -0400 Subject: [Numpy-discussion] Status of np.bincount In-Reply-To: References: <4FA28715.8060403@re-factory.de> <4FA28D26.1050406@re-factory.de> Message-ID: On Thu, May 3, 2012 at 9:57 AM, Robert Kern wrote: > On Thu, May 3, 2012 at 2:50 PM, Robert Elsner wrote: > > > > Am 03.05.2012 15:45, schrieb Robert Kern: > >> On Thu, May 3, 2012 at 2:24 PM, Robert Elsner > wrote: > >>> Hello Everybody, > >>> > >>> is there any news on the status of np.bincount with respect to "big" > >>> numbers? It seems I have just been bitten by #225. Is there an > efficient > >>> way around? I found the np.histogram function painfully slow. > >>> > >>> Below a simple script, that demonstrates bincount failing with a memory > >>> error on big numbers > >>> > >>> import numpy as np > >>> > >>> x = np.array((30e9,)).astype(int) > >>> np.bincount(x) > >>> > >>> > >>> Any good idea how to work around it. My arrays contain somewhat 50M > >>> entries in the range from 0 to 30e9. And I would like to have them > >>> bincounted... > >> > >> You need a sparse data structure, then. Are you sure you even have > duplicates? > >> > >> Anyways, I won't work out all of the details, but let me sketch > >> something that might get you your answers. First, sort your array. > >> Then use np.not_equal(x[:-1], x[1:]) as a mask on np.arange(1,len(x)) > >> to find the indices where each sorted value changes over to the next. > >> The np.diff() of that should give you the size of each. Use np.unique > >> to get the sorted unique values to match up with those sizes. > >> > >> Fixing all of the off-by-one errors and dealing with the boundary > >> conditions correctly is left as an exercise for the reader. > >> > > > > ?? I suspect that this mail was meant to end up in the thread about > > sparse array data? > > No, I am responding to you. > > Hi Robert (Elsner), Just to expand a bit on Robert Kern's explanation: Your problem is only partly related to Ticket #225 . Even if that is fixed, you won't be able to call `bincount` with an array containing `30e9` unless you implement something using sparse arrays because `bincount` wants return an array that's `30e9 + 1` in length, which isn't going to happen. -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Thu May 3 13:00:11 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 3 May 2012 10:00:11 -0700 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: References: <2E86DB1A-76C9-4308-8958-5F5D9EA38E85@jpl.nasa.gov> Message-ID: On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs wrote: > The cKDTree implementation is more than 4 times faster than the brute-force > approach: > > T = scipy.spatial.cKDTree(targets) > > In [11]: %timeit foo1(element, targets) ? # Brute force > 1000 loops, best of 3: 385 us per loop > > In [12]: %timeit foo2(element, T) ? ? ? ? # cKDTree > 10000 loops, best of 3: 83.5 us per loop > > In [13]: 385/83.5 > Out[13]: 4.610778443113772 Brute force plus cython beats cKDTree for knn with k=1 and Euclidean distance: I[5] targets = np.random.uniform(0, 8, (5000, 7)) I[6] element = np.arange(1, 8, dtype=np.float64) I[7] T = scipy.spatial.cKDTree(targets) I[8] timeit T.query(element) 10000 loops, best of 3: 36.1 us per loop I[9] timeit nn(targets, element) 10000 loops, best of 3: 28.5 us per loop What about lower dimensions (2 instead of 7) where cKDTree gets faster? I[18] element = np.arange(1,3, dtype=np.float64) I[19] targets = np.random.uniform(0,8,(5000,2)) I[20] T = scipy.spatial.cKDTree(targets) I[21] timeit T.query(element) 10000 loops, best of 3: 27.5 us per loop I[22] timeit nn(targets, element) 100000 loops, best of 3: 11.6 us per loop I could add nn to the bottleneck package. Is the k=1, Euclidean distance case too specialized? Prototype (messy) code is here: https://github.com/kwgoodman/bottleneck/issues/45 For a smaller speed up (~2x) foo1 could use bottleneck's sum or squares function, bn.ss(). From Catherine.M.Moroney at jpl.nasa.gov Thu May 3 13:22:01 2012 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (388D)) Date: Thu, 3 May 2012 17:22:01 +0000 Subject: [Numpy-discussion] record arrays initialization In-Reply-To: References: Message-ID: <0DF36F28-C02A-473A-BCCC-103767B1CF53@jpl.nasa.gov> > > > ------------------------------ > > Message: 6 > Date: Thu, 3 May 2012 10:00:11 -0700 > From: Keith Goodman > Subject: Re: [Numpy-discussion] record arrays initialization > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs > wrote: > >> The cKDTree implementation is more than 4 times faster than the brute-force >> approach: >> >> T = scipy.spatial.cKDTree(targets) >> >> In [11]: %timeit foo1(element, targets) ? # Brute force >> 1000 loops, best of 3: 385 us per loop >> >> In [12]: %timeit foo2(element, T) ? ? ? ? # cKDTree >> 10000 loops, best of 3: 83.5 us per loop >> >> In [13]: 385/83.5 >> Out[13]: 4.610778443113772 > > Brute force plus cython beats cKDTree for knn with k=1 and Euclidean distance: > > I[5] targets = np.random.uniform(0, 8, (5000, 7)) > I[6] element = np.arange(1, 8, dtype=np.float64) > > I[7] T = scipy.spatial.cKDTree(targets) > I[8] timeit T.query(element) > 10000 loops, best of 3: 36.1 us per loop > > I[9] timeit nn(targets, element) > 10000 loops, best of 3: 28.5 us per loop > > What about lower dimensions (2 instead of 7) where cKDTree gets faster? > > I[18] element = np.arange(1,3, dtype=np.float64) > I[19] targets = np.random.uniform(0,8,(5000,2)) > > I[20] T = scipy.spatial.cKDTree(targets) > I[21] timeit T.query(element) > 10000 loops, best of 3: 27.5 us per loop > > I[22] timeit nn(targets, element) > 100000 loops, best of 3: 11.6 us per loop > > I could add nn to the bottleneck package. Is the k=1, Euclidean > distance case too specialized? > > Prototype (messy) code is here: > https://github.com/kwgoodman/bottleneck/issues/45 > > For a smaller speed up (~2x) foo1 could use bottleneck's sum or > squares function, bn.ss(). > > Thank you everybody for your advice. I will actually be clustering the dataset for use in the production code (which must be in Fortran), and using scipy.cluster.vq.kmeans2 to generate and populate the clusters. Even though a brute-force search in Fortran is fast, it will still be too slow for the final code. But, the kmeans2 clustering isn't "perfect" in that it doesn't always give identical results to the brute force search, so I want to do an analysis of the differences on a test data-set. Catherine From Catherine.M.Moroney at jpl.nasa.gov Thu May 3 13:33:05 2012 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (388D)) Date: Thu, 3 May 2012 17:33:05 +0000 Subject: [Numpy-discussion] timing results (was: record arrays initialization) Message-ID: A quick recap of the problem: a 128x512 array of 7-element vectors (element), and a 5000-vector training dataset (targets). For each vector in element, I want to find the best-match in targets, defined as minimizing the Euclidean distance. I coded it up three ways: (a) looping through each vector in element individually, (b) vectorizing the function in the previous step, and coding it up in Fortran. The heart of the "find-best-match" code in Python looks like so I'm not doing an individual loop through all 5000 vectors in targets: nlen = xelement.shape[0] nvec = targets.data.shape[0] x = xelement.reshape(1, nlen).repeat(nvec, axis=0) diffs = ((x - targets.data)**2).sum(axis=1) diffs = numpy.sqrt(diffs) return int(numpy.argmin(diffs, axis=0)) Here are the results: (a) looping through each vector: 68 seconds (b) vectorizing this: 58 seconds (c) raw Fortran with loops: 26 seconds I was surprised to see that vectorizing didn't gain me that much time, and that the Fortran was so much faster than both python alternatives. So, there's a lot that I don't know about how the internals of numpy and python work. Why does the loop through 128x512 elements in python only take an additional 10 seconds? What is the main purpose of vectorizing - is it optimization by taking the looping step out of the Python and into the C-base or something different? And, why is the Fortran so much faster (even without optimization)? It looks like I'll be switching to Fortran after all. Catherine From wesmckinn at gmail.com Thu May 3 13:36:23 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 3 May 2012 13:36:23 -0400 Subject: [Numpy-discussion] Status of np.bincount In-Reply-To: References: <4FA28715.8060403@re-factory.de> <4FA28D26.1050406@re-factory.de> Message-ID: On Thu, May 3, 2012 at 12:51 PM, Tony Yu wrote: > > > On Thu, May 3, 2012 at 9:57 AM, Robert Kern wrote: >> >> On Thu, May 3, 2012 at 2:50 PM, Robert Elsner wrote: >> > >> > Am 03.05.2012 15:45, schrieb Robert Kern: >> >> On Thu, May 3, 2012 at 2:24 PM, Robert Elsner >> >> wrote: >> >>> Hello Everybody, >> >>> >> >>> is there any news on the status of np.bincount with respect to "big" >> >>> numbers? It seems I have just been bitten by #225. Is there an >> >>> efficient >> >>> way around? I found the np.histogram function painfully slow. >> >>> >> >>> Below a simple script, that demonstrates bincount failing with a >> >>> memory >> >>> error on big numbers >> >>> >> >>> import numpy as np >> >>> >> >>> x = np.array((30e9,)).astype(int) >> >>> np.bincount(x) >> >>> >> >>> >> >>> Any good idea how to work around it. My arrays contain somewhat 50M >> >>> entries in the range from 0 to 30e9. And I would like to have them >> >>> bincounted... >> >> >> >> You need a sparse data structure, then. Are you sure you even have >> >> duplicates? >> >> >> >> Anyways, I won't work out all of the details, but let me sketch >> >> something that might get you your answers. First, sort your array. >> >> Then use np.not_equal(x[:-1], x[1:]) as a mask on np.arange(1,len(x)) >> >> to find the indices where each sorted value changes over to the next. >> >> The np.diff() of that should give you the size of each. Use np.unique >> >> to get the sorted unique values to match up with those sizes. >> >> >> >> Fixing all of the off-by-one errors and dealing with the boundary >> >> conditions correctly is left as an exercise for the reader. >> >> >> > >> > ?? I suspect that this mail was meant to end up in the thread about >> > sparse array data? >> >> No, I am responding to you. >> > > Hi Robert (Elsner), > > Just to expand a bit on Robert Kern's explanation: Your problem is only > partly related to Ticket #225. Even if that is fixed, you won't be able to > call `bincount` with an array containing `30e9` unless you implement > something using sparse arrays because `bincount` wants return an array > that's `30e9 + 1` in length, which isn't going to happen. > > -Tony > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > hi Robert, I suggest you try the value_counts instance method on pandas.Series: In [9]: ints = np.random.randint(0, 30e9, size=100000) In [10]: all_ints = Series(ints.repeat(500)) In [11]: all_ints.value_counts() Out[11]: 16420382874 500 7147863689 500 4019588415 500 17462388002 500 11956087699 500 14888898988 500 3811318398 500 6333517765 500 16077665866 500 17559759721 500 5898309082 500 25213150655 500 17877388690 500 3122117900 500 6242860212 500 ... 6344689036 500 16817048573 500 16361777055 500 4376828961 500 15910505187 500 12051499627 500 23857610954 500 24557975709 500 28135006018 500 1661624653 500 6747702840 500 24601775145 500 7290769930 500 9417075109 500 12071596222 500 Length: 100000 This method uses a C hash table and takes about 1 second to compute the bin counts for 50mm entries and 100k unique values. - Wes From Catherine.M.Moroney at jpl.nasa.gov Thu May 3 13:38:48 2012 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (388D)) Date: Thu, 3 May 2012 17:38:48 +0000 Subject: [Numpy-discussion] timing results (was: record arrays initialization) In-Reply-To: References: Message-ID: <96973B6D-CE73-40A0-95CA-03E9EB6CB27C@jpl.nasa.gov> On May 3, 2012, at 10:33 AM, Moroney, Catherine M (388D) wrote: > A quick recap of the problem: a 128x512 array of 7-element vectors (element), and a 5000-vector > training dataset (targets). For each vector in element, I want to find the best-match in targets, > defined as minimizing the Euclidean distance. > > I coded it up three ways: (a) looping through each vector in element individually, (b) vectorizing > the function in the previous step, and coding it up in Fortran. The heart of the "find-best-match" > code in Python looks like so I'm not doing an individual loop through all 5000 vectors in targets: > > nlen = xelement.shape[0] > nvec = targets.data.shape[0] > x = xelement.reshape(1, nlen).repeat(nvec, axis=0) > > diffs = ((x - targets.data)**2).sum(axis=1) > diffs = numpy.sqrt(diffs) > return int(numpy.argmin(diffs, axis=0)) > > Here are the results: > > (a) looping through each vector: 68 seconds > (b) vectorizing this: 58 seconds > (c) raw Fortran with loops: 26 seconds > > I was surprised to see that vectorizing didn't gain me that much time, and that the Fortran > was so much faster than both python alternatives. So, there's a lot that I don't know about > how the internals of numpy and python work. > > Why does the loop through 128x512 elements in python only take an additional 10 seconds? What > is the main purpose of vectorizing - is it optimization by taking the looping step out of the > Python and into the C-base or something different? > > And, why is the Fortran so much faster (even without optimization)? > > It looks like I'll be switching to Fortran after all. > > Catherine > Actually Fortran with correct array ordering - 13 seconds! What horrible python/numpy mistake am I making to cause such a slowdown? Catherine From kwgoodman at gmail.com Thu May 3 13:49:15 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 3 May 2012 10:49:15 -0700 Subject: [Numpy-discussion] timing results (was: record arrays initialization) In-Reply-To: <96973B6D-CE73-40A0-95CA-03E9EB6CB27C@jpl.nasa.gov> References: <96973B6D-CE73-40A0-95CA-03E9EB6CB27C@jpl.nasa.gov> Message-ID: On Thu, May 3, 2012 at 10:38 AM, Moroney, Catherine M (388D) wrote: > Actually Fortran with correct array ordering - 13 seconds! ?What horrible python/numpy > mistake am I making to cause such a slowdown? For the type of problem you are working on, I'd flip it around and ask what you are doing with Fortran to only get 5x ;) Well, I'd ask that if I knew Fortran. I'm seeing 7.5x with cython/numpy. From perry at stsci.edu Thu May 3 14:28:27 2012 From: perry at stsci.edu (Perry Greenfield) Date: Thu, 3 May 2012 14:28:27 -0400 Subject: [Numpy-discussion] timing results (was: record arrays initialization) In-Reply-To: <96973B6D-CE73-40A0-95CA-03E9EB6CB27C@jpl.nasa.gov> References: <96973B6D-CE73-40A0-95CA-03E9EB6CB27C@jpl.nasa.gov> Message-ID: <4C40524B-9235-4A45-9F8A-2958C5E40EE5@stsci.edu> On May 3, 2012, at 1:38 PM, Moroney, Catherine M (388D) wrote: > > On May 3, 2012, at 10:33 AM, Moroney, Catherine M (388D) wrote: > >> A quick recap of the problem: a 128x512 array of 7-element vectors >> (element), and a 5000-vector >> training dataset (targets). For each vector in element, I want to >> find the best-match in targets, >> defined as minimizing the Euclidean distance. >> >> I coded it up three ways: (a) looping through each vector in >> element individually, (b) vectorizing >> the function in the previous step, and coding it up in Fortran. >> The heart of the "find-best-match" >> code in Python looks like so I'm not doing an individual loop >> through all 5000 vectors in targets: >> >> nlen = xelement.shape[0] >> nvec = targets.data.shape[0] >> x = xelement.reshape(1, nlen).repeat(nvec, axis=0) >> >> diffs = ((x - targets.data)**2).sum(axis=1) >> diffs = numpy.sqrt(diffs) >> return int(numpy.argmin(diffs, axis=0)) >> >> Here are the results: >> >> (a) looping through each vector: 68 seconds >> (b) vectorizing this: 58 seconds >> (c) raw Fortran with loops: 26 seconds >> >> I was surprised to see that vectorizing didn't gain me that much >> time, and that the Fortran >> was so much faster than both python alternatives. So, there's a >> lot that I don't know about >> how the internals of numpy and python work. >> >> Why does the loop through 128x512 elements in python only take an >> additional 10 seconds? What >> is the main purpose of vectorizing - is it optimization by taking >> the looping step out of the >> Python and into the C-base or something different? Because for the size of the arrays being manipulated inside the loop, the python/numpy loop overhead isn't all that big. If you were only doing 100 vectors in target, you would see a big difference. Perry From paul.anton.letnes at gmail.com Thu May 3 15:46:45 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Thu, 3 May 2012 21:46:45 +0200 Subject: [Numpy-discussion] timing results (was: record arrays initialization) In-Reply-To: References: Message-ID: On 3. mai 2012, at 19:33, Moroney, Catherine M (388D) wrote: > A quick recap of the problem: a 128x512 array of 7-element vectors (element), and a 5000-vector > training dataset (targets). For each vector in element, I want to find the best-match in targets, > defined as minimizing the Euclidean distance. > > I coded it up three ways: (a) looping through each vector in element individually, (b) vectorizing > the function in the previous step, and coding it up in Fortran. The heart of the "find-best-match" > code in Python looks like so I'm not doing an individual loop through all 5000 vectors in targets: > > nlen = xelement.shape[0] > nvec = targets.data.shape[0] > x = xelement.reshape(1, nlen).repeat(nvec, axis=0) > > diffs = ((x - targets.data)**2).sum(axis=1) > diffs = numpy.sqrt(diffs) > return int(numpy.argmin(diffs, axis=0)) > > Here are the results: > > (a) looping through each vector: 68 seconds > (b) vectorizing this: 58 seconds > (c) raw Fortran with loops: 26 seconds > > I was surprised to see that vectorizing didn't gain me that much time, and that the Fortran > was so much faster than both python alternatives. So, there's a lot that I don't know about > how the internals of numpy and python work. > > Why does the loop through 128x512 elements in python only take an additional 10 seconds? What > is the main purpose of vectorizing - is it optimization by taking the looping step out of the > Python and into the C-base or something different? If you're doing loops in python, python does all sort of magic for you. Type checking is one thing, and one of the simplest things to optimize away if you use cython. If you're writing an expression similar to this > ((x - targets.data)**2) where x and targets.data are vectors, the elements are subtracted and squared elementwise in C instead of in python. So yes, you've got the idea. > And, why is the Fortran so much faster (even without optimization)? Could you show us the code? It's hard to tell otherwise. As Keith Goodman pointed out, if he gets 7.5x with cython, it could be that the Fortran code could be improved as well. Fortran has a reputation of being the gold standard for performance in numerical computation, although one can often be surprised. Picking good algorithms is always more important than the language. Paul From stefan at sun.ac.za Thu May 3 16:03:55 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 3 May 2012 13:03:55 -0700 Subject: [Numpy-discussion] copying array to itself In-Reply-To: <1336035094.2430.0.camel@farnsworth> References: <1335974584.8654.44.camel@farnsworth> <1336035094.2430.0.camel@farnsworth> Message-ID: On Thu, May 3, 2012 at 1:51 AM, Henry Gomersall wrote: > Right, so this is expected behaviour then. Is this documented somewhere? > It strikes me that this is pretty unexpected behaviour. Imagine the way you would code this in a for-loop. You want a = np.arange(10) a[2:] = a[:-2] Now you write: for i in range(2, len(a)): a[i] = a[i - 2] which yields [0, 1, 0, 1, 0, 1, 0, 1, 0, 1] One of the great things about NumPy is that it only copies data when it really has to, and in this case it would need to be very clever to figure out what you are trying to do. St?fan From kwgoodman at gmail.com Thu May 3 16:08:23 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 3 May 2012 13:08:23 -0700 Subject: [Numpy-discussion] timing results (was: record arrays initialization) In-Reply-To: References: Message-ID: On Thu, May 3, 2012 at 12:46 PM, Paul Anton Letnes wrote: > > Could you show us the code? It's hard to tell otherwise. As Keith Goodman pointed out, if he gets 7.5x with cython, it could be that the Fortran code could be improved as well. Fortran has a reputation of being the gold standard for performance in numerical computation, although one can often be surprised. Picking good algorithms is always more important than the language. > Doing array operations with Fortran might slow things down. I got the speed up by calculating the distance one point at a time and not storing the results (only keeping the min value and index). That gets rid of the sorting (argmin) at the end too. From Catherine.M.Moroney at jpl.nasa.gov Thu May 3 18:12:25 2012 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (388D)) Date: Thu, 3 May 2012 22:12:25 +0000 Subject: [Numpy-discussion] timing results (was: record arrays initialization) In-Reply-To: References: Message-ID: <95D07FA7-FEBE-4E0F-86A6-CBFC79DF135C@jpl.nasa.gov> On May 3, 2012, at 1:00 PM, wrote: >> A quick recap of the problem: a 128x512 array of 7-element vectors (element), and a 5000-vector >> training dataset (targets). For each vector in element, I want to find the best-match in targets, >> defined as minimizing the Euclidean distance. >> >> I coded it up three ways: (a) looping through each vector in element individually, (b) vectorizing >> the function in the previous step, and coding it up in Fortran. The heart of the "find-best-match" >> code in Python looks like so I'm not doing an individual loop through all 5000 vectors in targets: >> >> nlen = xelement.shape[0] >> nvec = targets.data.shape[0] >> x = xelement.reshape(1, nlen).repeat(nvec, axis=0) >> >> diffs = ((x - targets.data)**2).sum(axis=1) >> diffs = numpy.sqrt(diffs) >> return int(numpy.argmin(diffs, axis=0)) >> >> Here are the results: >> >> (a) looping through each vector: 68 seconds >> (b) vectorizing this: 58 seconds >> (c) raw Fortran with loops: 26 seconds >> >> I was surprised to see that vectorizing didn't gain me that much time, and that the Fortran >> was so much faster than both python alternatives. So, there's a lot that I don't know about >> how the internals of numpy and python work. >> >> Why does the loop through 128x512 elements in python only take an additional 10 seconds? What >> is the main purpose of vectorizing - is it optimization by taking the looping step out of the >> Python and into the C-base or something different? > > If you're doing loops in python, python does all sort of magic for you. Type checking is one thing, and one of the simplest things to optimize away if you use cython. If you're writing an expression similar to this >> ((x - targets.data)**2) > where x and targets.data are vectors, the elements are subtracted and squared elementwise in C instead of in python. So yes, you've got the idea. > >> And, why is the Fortran so much faster (even without optimization)? > > Could you show us the code? It's hard to tell otherwise. As Keith Goodman pointed out, if he gets 7.5x with cython, it could be that the Fortran code could be improved as well. Fortran has a reputation of being the gold standard for performance in numerical computation, although one can often be surprised. Picking good algorithms is always more important than the language. > > Paul > I'd be very curious to know if the Fortran can be improved on a bit further. The full scale of the problem dwarfs what I describe here, so any additional speed I can wring out of this would be very welcome. Can somebody give me the "numpy for dummies" explanation as to what the main source of the timing differences is? What portion of the calculations is being done in "slow" python that's the determining factor for the slowdown? Here is the python code: def single(element, targets): if (isinstance(element, tuple)): xelement = element[0] elif (isinstance(element, numpy.ndarray)): xelement = element else: return FILL nlen = xelement.shape[0] nvec = targets.data.shape[0] x = xelement.reshape(1, nlen).repeat(nvec, axis=0) diffs = ((x - targets.data)**2).sum(axis=1) diffs = numpy.sqrt(diffs) return int(numpy.argmin(diffs, axis=0)) multiple = numpy.vectorize(single) python_results = multiple(vectors, Target) where vectors is a 65536*7 array and Target is 5000x7. So I'm evaluating 5000 potential choices for each of 65536 vectors. And here is the Fortran: matches(:) = -9999 do iv = 1, nvectors min_dist = 9999999999999.0 min_idx = -9999 do it = 1, ntargets dvector = targets(:,it) - vectors(:,iv) dist = sqrt(sum(dvector*dvector)) if (dist < min_dist) then min_dist = dist min_idx = it endif end do matches(iv) = min_idx end do Catherine From kwgoodman at gmail.com Thu May 3 18:49:22 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 3 May 2012 15:49:22 -0700 Subject: [Numpy-discussion] timing results (was: record arrays initialization) In-Reply-To: <95D07FA7-FEBE-4E0F-86A6-CBFC79DF135C@jpl.nasa.gov> References: <95D07FA7-FEBE-4E0F-86A6-CBFC79DF135C@jpl.nasa.gov> Message-ID: On Thu, May 3, 2012 at 3:12 PM, Moroney, Catherine M (388D) wrote: > Here is the python code: > > def single(element, targets): > > ? ?if (isinstance(element, tuple)): > ? ? ? ?xelement = element[0] > ? ?elif (isinstance(element, numpy.ndarray)): > ? ? ? ?xelement = element > ? ?else: > ? ? ? ?return FILL > > ? ?nlen = xelement.shape[0] > ? ?nvec = targets.data.shape[0] > ? ?x = xelement.reshape(1, nlen).repeat(nvec, axis=0) repeat is slow. I don't think you need it since broadcasting should take care of things. (But maybe I misunderstand the data.) > ? ?diffs = ((x - targets.data)**2).sum(axis=1) You could try np.dot instead of sum: one = np.ones(7); diff = np.dot(diff, one). You could even pass one in. > ? ?diffs = numpy.sqrt(diffs) Since you don't return the distance, no need for the sqrt. If you do need the sqrt only take the sqrt of one element instead of all elements. > ? ?return int(numpy.argmin(diffs, axis=0)) From paul.anton.letnes at gmail.com Fri May 4 02:19:23 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Fri, 4 May 2012 08:19:23 +0200 Subject: [Numpy-discussion] timing results (was: record arrays initialization) In-Reply-To: References: <95D07FA7-FEBE-4E0F-86A6-CBFC79DF135C@jpl.nasa.gov> Message-ID: On Fri, May 4, 2012 at 12:49 AM, Keith Goodman wrote: > On Thu, May 3, 2012 at 3:12 PM, Moroney, Catherine M (388D) > wrote: > >> Here is the python code: >> >> def single(element, targets): >> >> ? ?if (isinstance(element, tuple)): >> ? ? ? ?xelement = element[0] >> ? ?elif (isinstance(element, numpy.ndarray)): >> ? ? ? ?xelement = element >> ? ?else: >> ? ? ? ?return FILL >> >> ? ?nlen = xelement.shape[0] >> ? ?nvec = targets.data.shape[0] >> ? ?x = xelement.reshape(1, nlen).repeat(nvec, axis=0) > > repeat is slow. I don't think you need it since broadcasting should > take care of things. (But maybe I misunderstand the data.) > >> ? ?diffs = ((x - targets.data)**2).sum(axis=1) > > You could try np.dot instead of sum: one = np.ones(7); diff = > np.dot(diff, one). You could even pass one in. > >> ? ?diffs = numpy.sqrt(diffs) > > Since you don't return the distance, no need for the sqrt. If you do > need the sqrt only take the sqrt of one element instead of all > elements. > >> ? ?return int(numpy.argmin(diffs, axis=0)) As Keith says, you don't have to do the sqrt of everything. I'd probably set the min_dist to be a negative value initially. Perhaps you'll encounter large distances at some point, and negative values are more obviously incorrect when you're debugging. If you're 100% sure you're setting every single line of the array matches, then you can skip this line: matches(:) = -9999 It won't buy you much, but if it's for free... Two suggestions. Either: dvector = targets(:,it) - vectors(:,iv) dist = sqrt(sum(dvector*dvector)) If you rewrite it in a loop as dist = 0 do itemp = 1, size(targets, 1) dist = dist + (targets(itemp, it) - vectors(itemp, iv))**2 end do (I am skipping the sqrt as it is not needed.) That way you're not looping over the array three times. You will be doing the sum, the difference, and the exponentiation all in one loop. Moving memory around is typically more expensive than computing. A good Fortran compiler should optimize **2 into multiplication by itself instead of calling a library routine, too. Alternatively: dist = sqrt(sum(dvector*dvector)) Can be written as (skipping the sqrt again): dist = dot_product(dvector, dvector) or: dist = DDOT(size(dvector), dvector, 1, dvector, 1) In fortran there's the builtin dot_product function. If this is in fact one of your bottlenecks, you should also try calling DDOT (= dot product) from an optimized BLAS library. This is what numpy.dot does, too, but in Fortran you avoid calling overhead. Hope this helps - it's kind of hard to know exactly what works without playing with the code yourself :) Also, sometimes your compiler has some of these things already and you don't get anything by hand-optimizing. Paul From cmueller_dev at yahoo.com Thu May 3 12:44:15 2012 From: cmueller_dev at yahoo.com (Chris Mueller) Date: Thu, 3 May 2012 09:44:15 -0700 (PDT) Subject: [Numpy-discussion] 2012 SciPy Bioinformatics Workshop Message-ID: <1336063455.23270.YahooMailNeo@web111204.mail.gq1.yahoo.com> We are pleased to announce the 2012 SciPy Bioinformatics Workshop held in conjunction with SciPy 2012 this July in Austin, TX. Python in biology is not dead yet... in fact, it's alive and well! Remember just a few short years ago when BioPerl ruled the world? ?Just one minor paradigm shift* later and Python now has a commanding presence in bioinformatics. From Python bindings to common tools all the way to entire Python-based informatics platforms, Python is used everywhere** in modern bioinformatics. If you use Python for bioinformatics or just want to learn more about how its being used, join us at the 2012 SciPy Bioinformatics Workshop. We will have speakers from both academia and industry showcasing how Python is enabling biologists to effectively work with large, complex data sets. The workshop will be held the evening of July 19 from 5-6:30. More information about SciPy is available on the conference site: ? http://conference.scipy.org/scipy2012/ !! Participate !! Are you using Python in bioinformatics? ?We'd love to have you share your story. ?We are looking for 3-4 speakers to share their experiences using Python for bioinformatics. ? Please contact Chris Mueller at chris.mueller [at] lab7.io and Ray Roberts at rroberts [at] enthought.com to volunteer. Please include a brief description or link to a paper/topic which you would like to discuss. Presentations will last for 15 minutes each and will be followed by a panel Q&A. -- * That would be next generation sequencing ** Yes, we aRe awaRe of that otheR language used eveRywhere, but let's celebRate Python Right now. From ischnell at enthought.com Fri May 4 17:38:34 2012 From: ischnell at enthought.com (Ilan Schnell) Date: Fri, 4 May 2012 16:38:34 -0500 Subject: [Numpy-discussion] Quaternion data type Message-ID: Hello everyone, what is the plan for Quaternion data types in numpy? I saw that during last years SciPy spring https://github.com/martinling/numpy_quaternion was started, but not updated or released since then. Thanks Ilan From charlesr.harris at gmail.com Fri May 4 18:36:08 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 May 2012 16:36:08 -0600 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: Hi Ilan On Fri, May 4, 2012 at 3:38 PM, Ilan Schnell wrote: > Hello everyone, > > what is the plan for Quaternion data types in numpy? > I saw that during last years SciPy spring > https://github.com/martinling/numpy_quaternion > was started, but not updated or released since then. > > That was Martin Ling, link and thread here . I'm not sure what happened with this but I suspect we are waiting for extension types to be fixed up in master. Mark had some thoughts along those lines. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri May 4 19:57:33 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 05 May 2012 01:57:33 +0200 Subject: [Numpy-discussion] What is consensus anyway In-Reply-To: References: Message-ID: 26.04.2012 03:11, Travis Oliphant kirjoitti: [clip] > It would be nice if every pull request created a message to this list. > Is that even possible? Unidirectional forwarding is possible, for instance using Github's API, https://github.com/pv/github-pull-request-fwd Github itself doesn't offer tools to do this, so an external server for sending the mails is needed, and the mailing list admins need to allow the corresponding mails to pass through. Pauli From ischnell at enthought.com Fri May 4 23:44:49 2012 From: ischnell at enthought.com (Ilan Schnell) Date: Fri, 4 May 2012 22:44:49 -0500 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: Hi Chuck, thanks for the prompt reply. I as curious because because someone was interested in adding http://pypi.python.org/pypi/Quaternion to EPD, but Martin and Mark's implementation of quaternions looks much better. - Ilan On Fri, May 4, 2012 at 5:36 PM, Charles R Harris wrote: > Hi Ilan > > On Fri, May 4, 2012 at 3:38 PM, Ilan Schnell wrote: >> >> Hello everyone, >> >> what is the plan for Quaternion data types in numpy? >> I saw that during last years SciPy spring >> https://github.com/martinling/numpy_quaternion >> was started, but not updated or released since then. >> > > That was Martin Ling, link and thread here . I'm not sure what happened with > this but I suspect we are waiting for extension types to be fixed up in > master. Mark had some thoughts along those lines. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From aldcroft at head.cfa.harvard.edu Sat May 5 07:27:59 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Sat, 5 May 2012 07:27:59 -0400 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell wrote: > Hi Chuck, > > thanks for the prompt reply. ?I as curious because because > someone was interested in adding http://pypi.python.org/pypi/Quaternion > to EPD, but Martin and Mark's implementation of quaternions > looks much better. Hi - I'm a co-author of the above mentioned Quaternion package. I agree the numpy_quaternion version would be better, but if there is no expectation that it will move forward I can offer to improve our Quaternion. A few months ago I played around with making it accept arbitrary array inputs (with similar shape of course) to essentially vectorize the transformations. We never got around to putting this in a release because of a perceived lack of interest / priorities... If this would be useful then let me know. Best, Tom > - Ilan > > > On Fri, May 4, 2012 at 5:36 PM, Charles R Harris > wrote: >> Hi Ilan >> >> On Fri, May 4, 2012 at 3:38 PM, Ilan Schnell wrote: >>> >>> Hello everyone, >>> >>> what is the plan for Quaternion data types in numpy? >>> I saw that during last years SciPy spring >>> https://github.com/martinling/numpy_quaternion >>> was started, but not updated or released since then. >>> >> >> That was Martin Ling, link and thread here . I'm not sure what happened with >> this but I suspect we are waiting for extension types to be fixed up in >> master. Mark had some thoughts along those lines. >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sat May 5 12:55:00 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 10:55:00 -0600 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft wrote: > On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell > wrote: > > Hi Chuck, > > > > thanks for the prompt reply. I as curious because because > > someone was interested in adding http://pypi.python.org/pypi/Quaternion > > to EPD, but Martin and Mark's implementation of quaternions > > looks much better. > > Hi - > > I'm a co-author of the above mentioned Quaternion package. I agree > the numpy_quaternion version would be better, but if there is no > expectation that it will move forward I can offer to improve our > Quaternion. A few months ago I played around with making it accept > arbitrary array inputs (with similar shape of course) to essentially > vectorize the transformations. We never got around to putting this in > a release because of a perceived lack of interest / priorities... If > this would be useful then let me know. > > Would you be interested in carrying Martin's package forward? I'm not opposed to having quaternions in numpy/scipy but there needs to be someone to push it and deal with problems if they come up. Martin's package disappeared in large part because Martin disappeared. I'd also like to hear from Mark about other aspects, as there was also a simple rational user type proposed that we were looking to put in as an extension 'test' type. IIRC, there were some needed fixes to Numpy, some of which were postponed in favor of larger changes. User types is one of the things we want ot get fixed up. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat May 5 13:19:17 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 5 May 2012 12:19:17 -0500 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 11:55 AM, Charles R Harris wrote: > > > On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft < > aldcroft at head.cfa.harvard.edu> wrote: > >> On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell >> wrote: >> > Hi Chuck, >> > >> > thanks for the prompt reply. I as curious because because >> > someone was interested in adding http://pypi.python.org/pypi/Quaternion >> > to EPD, but Martin and Mark's implementation of quaternions >> > looks much better. >> >> Hi - >> >> I'm a co-author of the above mentioned Quaternion package. I agree >> the numpy_quaternion version would be better, but if there is no >> expectation that it will move forward I can offer to improve our >> Quaternion. A few months ago I played around with making it accept >> arbitrary array inputs (with similar shape of course) to essentially >> vectorize the transformations. We never got around to putting this in >> a release because of a perceived lack of interest / priorities... If >> this would be useful then let me know. >> >> > Would you be interested in carrying Martin's package forward? I'm not > opposed to having quaternions in numpy/scipy but there needs to be someone > to push it and deal with problems if they come up. Martin's package > disappeared in large part because Martin disappeared. I'd also like to hear > from Mark about other aspects, as there was also a simple rational user > type proposed that we were looking to put in as an extension 'test' type. > IIRC, there were some needed fixes to Numpy, some of which were postponed > in favor of larger changes. User types is one of the things we want ot get > fixed up. > I kind of like the idea of there being a package, separate from numpy, which collects these dtypes together. To start, the quaternion and the rational type could go in it, and eventually I think it would be nice to move datetime64 there as well. Maybe it could be called numpy-dtypes, or would a more creative name be better? -Mark > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 5 14:06:12 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 12:06:12 -0600 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 11:19 AM, Mark Wiebe wrote: > On Sat, May 5, 2012 at 11:55 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft < >> aldcroft at head.cfa.harvard.edu> wrote: >> >>> On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell >>> wrote: >>> > Hi Chuck, >>> > >>> > thanks for the prompt reply. I as curious because because >>> > someone was interested in adding >>> http://pypi.python.org/pypi/Quaternion >>> > to EPD, but Martin and Mark's implementation of quaternions >>> > looks much better. >>> >>> Hi - >>> >>> I'm a co-author of the above mentioned Quaternion package. I agree >>> the numpy_quaternion version would be better, but if there is no >>> expectation that it will move forward I can offer to improve our >>> Quaternion. A few months ago I played around with making it accept >>> arbitrary array inputs (with similar shape of course) to essentially >>> vectorize the transformations. We never got around to putting this in >>> a release because of a perceived lack of interest / priorities... If >>> this would be useful then let me know. >>> >>> >> Would you be interested in carrying Martin's package forward? I'm not >> opposed to having quaternions in numpy/scipy but there needs to be someone >> to push it and deal with problems if they come up. Martin's package >> disappeared in large part because Martin disappeared. I'd also like to hear >> from Mark about other aspects, as there was also a simple rational user >> type proposed that we were looking to put in as an extension 'test' type. >> IIRC, there were some needed fixes to Numpy, some of which were postponed >> in favor of larger changes. User types is one of the things we want ot get >> fixed up. >> > > I kind of like the idea of there being a package, separate from numpy, > which collects these dtypes together. To start, the quaternion and the > rational type could go in it, and eventually I think it would be nice to > move datetime64 there as well. Maybe it could be called numpy-dtypes, or > would a more creative name be better? > I'm trying to think about how that would be organized. We could create a new repository, numpy-user-types (numpy-extension-types), under the numpy umbrella. It would need documents and such as well as someone interested in maintaining it and making releases. A branch in the numpy repository wouldn't work since we would want to rebase it regularly. It could maybe go in scipy but a new package would need to be created there and it feels too distant from numpy for such basic types as datetime. Do you have thoughts about the details? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat May 5 14:15:47 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 5 May 2012 20:15:47 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 Message-ID: Hi, I'm pleased to announce the availability of the first release candidate of NumPy 1.6.2. This is a maintenance release. Due to the delay of the NumPy 1.7.0, this release contains far more fixes than a regular NumPy bugfix release. It also includes a number of documentation and build improvements. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ Please test this release and report any issues on the numpy-discussion mailing list. Cheers, Ralf ``numpy.core`` issues fixed --------------------------- #2063 make unique() return consistent index #1138 allow creating arrays from empty buffers or empty slices #1446 correct note about correspondence vstack and concatenate #1149 make argmin() work for datetime #1672 fix allclose() to work for scalar inf #1747 make np.median() work for 0-D arrays #1776 make complex division by zero to yield inf properly #1675 add scalar support for the format() function #1905 explicitly check for NaNs in allclose() #1952 allow floating ddof in std() and var() #1948 fix regression for indexing chararrays with empty list #2017 fix type hashing #2046 deleting array attributes causes segfault #2033 a**2.0 has incorrect type #2045 make attribute/iterator_element deletions not segfault #2021 fix segfault in searchsorted() #2073 fix float16 __array_interface__ bug ``numpy.lib`` issues fixed -------------------------- #2048 break reference cycle in NpzFile #1573 savetxt() now handles complex arrays #1387 allow bincount() to accept empty arrays #1899 fixed histogramdd() bug with empty inputs #1793 fix failing npyio test under py3k #1936 fix extra nesting for subarray dtypes #1848 make tril/triu return the same dtype as the original array #1918 use Py_TYPE to access ob_type, so it works also on Py3 ``numpy.f2py`` changes ---------------------- ENH: Introduce new options extra_f77_compiler_args and extra_f90_compiler_args BLD: Improve reporting of fcompiler value BUG: Fix f2py test_kind.py test ``numpy.poly`` changes ---------------------- ENH: Add some tests for polynomial printing ENH: Add companion matrix functions DOC: Rearrange the polynomial documents BUG: Fix up links to classes DOC: Add version added to some of the polynomial package modules DOC: Document xxxfit functions in the polynomial package modules BUG: The polynomial convenience classes let different types interact DOC: Document the use of the polynomial convenience classes DOC: Improve numpy reference documentation of polynomial classes ENH: Improve the computation of polynomials from roots STY: Code cleanup in polynomial [*]fromroots functions DOC: Remove references to cast and NA, which were added in 1.7 ``numpy.distutils`` issues fixed ------------------------------- #1261 change compile flag on AIX from -O5 to -O3 #1377 update HP compiler flags #1383 provide better support for C++ code on HPUX #1857 fix build for py3k + pip BLD: raise a clearer warning in case of building without cleaning up first BLD: follow build_ext coding convention in build_clib BLD: fix up detection of Intel CPU on OS X in system_info.py BLD: add support for the new X11 directory structure on Ubuntu & co. BLD: add ufsparse to the libraries search path. BLD: add 'pgfortran' as a valid compiler in the Portland Group BLD: update version match regexp for IBM AIX Fortran compilers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 5 14:45:57 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 12:45:57 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 12:15 PM, Ralf Gommers wrote: > Hi, > > I'm pleased to announce the availability of the first release candidate of > NumPy 1.6.2. This is a maintenance release. Due to the delay of the NumPy > 1.7.0, this release contains far more fixes than a regular NumPy bugfix > release. It also includes a number of documentation and build improvements. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > Please test this release and report any issues on the numpy-discussion > mailing list. > > Cheers, > Ralf > > > > ``numpy.core`` issues fixed > --------------------------- > > #2063 make unique() return consistent index > #1138 allow creating arrays from empty buffers or empty slices > #1446 correct note about correspondence vstack and concatenate > #1149 make argmin() work for datetime > #1672 fix allclose() to work for scalar inf > #1747 make np.median() work for 0-D arrays > #1776 make complex division by zero to yield inf properly > #1675 add scalar support for the format() function > #1905 explicitly check for NaNs in allclose() > #1952 allow floating ddof in std() and var() > #1948 fix regression for indexing chararrays with empty list > #2017 fix type hashing > #2046 deleting array attributes causes segfault > #2033 a**2.0 has incorrect type > #2045 make attribute/iterator_element deletions not segfault > #2021 fix segfault in searchsorted() > #2073 fix float16 __array_interface__ bug > > > ``numpy.lib`` issues fixed > -------------------------- > > #2048 break reference cycle in NpzFile > #1573 savetxt() now handles complex arrays > #1387 allow bincount() to accept empty arrays > #1899 fixed histogramdd() bug with empty inputs > #1793 fix failing npyio test under py3k > #1936 fix extra nesting for subarray dtypes > #1848 make tril/triu return the same dtype as the original array > #1918 use Py_TYPE to access ob_type, so it works also on Py3 > > > ``numpy.f2py`` changes > ---------------------- > > ENH: Introduce new options extra_f77_compiler_args and > extra_f90_compiler_args > BLD: Improve reporting of fcompiler value > BUG: Fix f2py test_kind.py test > > > ``numpy.poly`` changes > ---------------------- > > ENH: Add some tests for polynomial printing > ENH: Add companion matrix functions > DOC: Rearrange the polynomial documents > BUG: Fix up links to classes > DOC: Add version added to some of the polynomial package modules > DOC: Document xxxfit functions in the polynomial package modules > BUG: The polynomial convenience classes let different types interact > DOC: Document the use of the polynomial convenience classes > DOC: Improve numpy reference documentation of polynomial classes > ENH: Improve the computation of polynomials from roots > STY: Code cleanup in polynomial [*]fromroots functions > DOC: Remove references to cast and NA, which were added in 1.7 > > > ``numpy.distutils`` issues fixed > ------------------------------- > > #1261 change compile flag on AIX from -O5 to -O3 > #1377 update HP compiler flags > #1383 provide better support for C++ code on HPUX > #1857 fix build for py3k + pip > BLD: raise a clearer warning in case of building without cleaning up > first > BLD: follow build_ext coding convention in build_clib > BLD: fix up detection of Intel CPU on OS X in system_info.py > BLD: add support for the new X11 directory structure on Ubuntu & co. > BLD: add ufsparse to the libraries search path. > BLD: add 'pgfortran' as a valid compiler in the Portland Group > BLD: update version match regexp for IBM AIX Fortran compilers. > > > Darn, I missed the omission of the random fix from the notes. I'm no good for spelling either. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat May 5 15:11:14 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 5 May 2012 21:11:14 +0200 Subject: [Numpy-discussion] numpy doc for percentile function In-Reply-To: References: <4F7C1E1B.3020403@crans.org> Message-ID: On Wed, Apr 4, 2012 at 3:53 PM, Skipper Seabold wrote: > On Wed, Apr 4, 2012 at 6:10 AM, Pierre Haessig wrote: > >> Hi, >> >> I'm looking for the entry point in Numpy doc for the percentile function. >> I'm assuming it should sit in routines.statistics but do not see it : >> http://docs.scipy.org/doc/numpy/reference/routines.statistics.html >> >> > I don't see it in the docs either. > > >> Am I missing something ? If indeed the percentile entry should be added, >> do you agree it could be added to the "Histogram" section ? (and >> "Histogram" would become "Histograms and percentiles") >> >> > IMO it should go under the extremal values header and this should be > changed to order statistics. > > Seems reasonable. I'm merging some edits from the doc wiki, I'll include this unless someone has a better suggestion. > Also, as Fr?d?ric Bastien pointed out, I feel that the current doc build >> is broken (especially the links :-( ) >> >> > Indeed. It does look that way. It is the same on my local build. Perhaps > this deserves a separate message. They show up here. > It's quite a painful process to fix all doc build errors. Any help would be appreciated here. In principle it's simple: build the docs with latest Sphinx by typing "make latex" in the doc directory (gives O(700) warnings), then go through the build log to locate issues. Takes a long time though. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From kalatsky at gmail.com Sat May 5 15:14:47 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Sat, 5 May 2012 14:14:47 -0500 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: Hi Tod, Would you consider bundling the quaternion dtype with your package. I think everybody wins: your package would become stronger and Martin's dtype would become easily available. Thanks Val On Sat, May 5, 2012 at 6:27 AM, Tom Aldcroft wrote: > On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell > wrote: > > Hi Chuck, > > > > thanks for the prompt reply. I as curious because because > > someone was interested in adding http://pypi.python.org/pypi/Quaternion > > to EPD, but Martin and Mark's implementation of quaternions > > looks much better. > > Hi - > > I'm a co-author of the above mentioned Quaternion package. I agree > the numpy_quaternion version would be better, but if there is no > expectation that it will move forward I can offer to improve our > Quaternion. A few months ago I played around with making it accept > arbitrary array inputs (with similar shape of course) to essentially > vectorize the transformations. We never got around to putting this in > a release because of a perceived lack of interest / priorities... If > this would be useful then let me know. > > Best, > Tom > > > - Ilan > > > > > > On Fri, May 4, 2012 at 5:36 PM, Charles R Harris > > wrote: > >> Hi Ilan > >> > >> On Fri, May 4, 2012 at 3:38 PM, Ilan Schnell > wrote: > >>> > >>> Hello everyone, > >>> > >>> what is the plan for Quaternion data types in numpy? > >>> I saw that during last years SciPy spring > >>> https://github.com/martinling/numpy_quaternion > >>> was started, but not updated or released since then. > >>> > >> > >> That was Martin Ling, link and thread here . I'm not sure what happened > with > >> this but I suspect we are waiting for extension types to be fixed up in > >> master. Mark had some thoughts along those lines. > >> > >> Chuck > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat May 5 15:28:15 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 5 May 2012 21:28:15 +0200 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Wed, May 2, 2012 at 11:25 PM, Wes McKinney wrote: > On Wed, May 2, 2012 at 9:48 AM, Charles R Harris > wrote: > > > > > > On Tue, May 1, 2012 at 11:47 PM, Ralf Gommers < > ralf.gommers at googlemail.com> > > wrote: > >> > >> > >> > >> On Wed, May 2, 2012 at 1:48 AM, Pauli Virtanen wrote: > >>> > >>> 01.05.2012 21:34, Ralf Gommers kirjoitti: > >>> [clip] > >>> > At this point it's probably good to look again at the problems we > want > >>> > to solve: > >>> > 1. responsive user interface (must absolutely have) > >>> > >>> Now that it comes too late: with some luck, I've possibly hit on what > >>> was ailing the Tracs (max_diff_bytes configured too large). Let's see > if > >>> things work better from now on... > >> > >> > >> That's amazing - not only does it not give errors anymore, it's also an > >> order of magnitude faster. > >> > > > > So maybe we could just stick with trac. Performance was really the > sticking > > point. > > > > Chuck > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > FWIW I'm pretty strongly in favor of GHI for NumPy/SciPy (I am going > to get involved in NumPy dev eventually, promise). While warty in some > of the places already mentioned, I have found it to be very > low-friction and low-annoyance in my own dev process (nearing 1000 > issues closed in the last year in pandas). But there are fewer cooks > in the kitchen with pandas so perhaps this experience wouldn't be > identical with NumPy. The biggest benefit I've seen is community > involvement that you really wouldn't see if I were using a Trac or > something else hosted elsewhere. Users are on GitHub and it for some > reason gives people a feeling of engagement in the open source process > that I don't see anywhere else. Feels like it's time to make a decision on this. I see no blocking objections against Github, so perhaps we should give it a go. The attachment issue for data files can be solved by relocating those to a server we still administer. Trac is currently annoying me also, because I need to change the milestone of ~50 tickets and have no good way of doing it. So nothing's perfect. Github's hosting service, possibly more user involvement and centralizing all our tools there may be enough to outweigh the limitations of GHI. Proposal: move NumPy tickets to Github. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat May 5 15:33:54 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 5 May 2012 21:33:54 +0200 Subject: [Numpy-discussion] Is NumpyDotNet (aka numpy-refactor) likely to be merged into the mainline? In-Reply-To: References: Message-ID: On Tue, May 1, 2012 at 9:24 PM, Seth Nickell wrote: > With a little work, I think numpy/scipy could be very useful to those > of us who have to program on .NET for one reason or another, but > 64-bit is currently not supported (at least, not as released). > > I'm considering working out 64-bit support, but it appears to me like > the numpy-refactor repository isn't on a path to merging with the > mainline, and is likely to bit-rot (if it hasn't already). Is anyone > working on this, or is NumpyDotNet 'resting' at the moment, so to > speak? Its sort of pointless to work on a dead-end branch ;-) > It's my impression that that would indeed be pointless - at least for the near future. If there are still plans to work on the refactor branch, someone please correct me. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sat May 5 15:43:32 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 5 May 2012 12:43:32 -0700 Subject: [Numpy-discussion] Fwd: [SIAM-CSE] pOSKI - Autotuner for parallel sparse-matrix-vector Message-ID: Interesting, given the recent discussion on sparsity... ---------- Forwarded message ---------- From: James Demmel Date: Fri, May 4, 2012 at 12:03 PM Subject: [SIAM-CSE] pOSKI - Autotuner for parallel sparse-matrix-vector multiplication To: siam-cse at siam.org We are pleased to announce the first public release of pOSKI, the Parallel Optimized Sparse Kernel Interface. pOSKI automatically produces high performance implementations of Sparse-Matrix-Vector multiplication (SpMV) and related operations. pOSKI targets current multicore CPU platforms, and is an extension of prior work on OSKI, which targeted single core processors. Both pOSKI and OSKI use autotuning, which means searching a design space of sparse matrix data structures and algorithms that depend on the matrix properties (eg sparsity pattern and symmetry), the operation to be performed (A*x, transpose(A)*x, etc.), and the target platform. For example, on an Intel Jaketown, with 6 cores and 12 threads, pOSKI gets a speedup of 4.8x over 1 core, and is 1.6x faster than Intel MKL's dcsrmv, both on the matrix wikipedia-2007 from the University of Florida collection. pOSKI is freely available under a BSD license at bebop.cs.berkeley.edu/poski which also includes all the documentation, and a mailing list for asking questions. pOSKI is based on contributions from many people, and was produced by BEBOP, the Berkeley Benchmarking and OPtimization Group (bebop.cs.berkeley.edu). Regards, Jim Demmel and Jong-Ho Byun UC Berkeley _______________________________________________ SIAM-CSE mailing list To post messages to the list please send them to: SIAM-CSE at siam.org http://lists.siam.org/mailman/listinfo/siam-cse From fperez.net at gmail.com Sat May 5 15:44:10 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 5 May 2012 12:44:10 -0700 Subject: [Numpy-discussion] Fwd: [SIAM-CSE] pOSKI - Autotuner for parallel sparse-matrix-vector multiplication Message-ID: Interesting, given the recent discussion on sparsity... ---------- Forwarded message ---------- From: James Demmel Date: Fri, May 4, 2012 at 12:03 PM Subject: [SIAM-CSE] pOSKI - Autotuner for parallel sparse-matrix-vector multiplication To: siam-cse at siam.org We are pleased to announce the first public release of pOSKI, the Parallel Optimized Sparse Kernel Interface. pOSKI automatically produces high performance implementations of Sparse-Matrix-Vector multiplication (SpMV) and related operations. pOSKI targets current multicore CPU platforms, and is an extension of prior work on OSKI, which targeted single core processors. Both pOSKI and OSKI use autotuning, which means searching a design space of sparse matrix data structures and algorithms that depend on the matrix properties (eg sparsity pattern and symmetry), the operation to be performed (A*x, transpose(A)*x, etc.), and the target platform. For example, on an Intel Jaketown, with 6 cores and 12 threads, pOSKI gets a speedup of 4.8x over 1 core, and is 1.6x faster than Intel MKL's dcsrmv, both on the matrix wikipedia-2007 from the University of Florida collection. pOSKI is freely available under a BSD license at bebop.cs.berkeley.edu/poski which also includes all the documentation, and a mailing list for asking questions. pOSKI is based on contributions from many people, and was produced by BEBOP, the Berkeley Benchmarking and OPtimization Group (bebop.cs.berkeley.edu). Regards, Jim Demmel and Jong-Ho Byun UC Berkeley _______________________________________________ SIAM-CSE mailing list To post messages to the list please send them to: SIAM-CSE at siam.org http://lists.siam.org/mailman/listinfo/siam-cse From charlesr.harris at gmail.com Sat May 5 15:54:01 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 13:54:01 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: On Sat, May 5, 2012 at 1:28 PM, Ralf Gommers wrote: > > > On Wed, May 2, 2012 at 11:25 PM, Wes McKinney wrote: > >> On Wed, May 2, 2012 at 9:48 AM, Charles R Harris >> wrote: >> > >> > >> > On Tue, May 1, 2012 at 11:47 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> >> > wrote: >> >> >> >> >> >> >> >> On Wed, May 2, 2012 at 1:48 AM, Pauli Virtanen wrote: >> >>> >> >>> 01.05.2012 21:34, Ralf Gommers kirjoitti: >> >>> [clip] >> >>> > At this point it's probably good to look again at the problems we >> want >> >>> > to solve: >> >>> > 1. responsive user interface (must absolutely have) >> >>> >> >>> Now that it comes too late: with some luck, I've possibly hit on what >> >>> was ailing the Tracs (max_diff_bytes configured too large). Let's see >> if >> >>> things work better from now on... >> >> >> >> >> >> That's amazing - not only does it not give errors anymore, it's also an >> >> order of magnitude faster. >> >> >> > >> > So maybe we could just stick with trac. Performance was really the >> sticking >> > point. >> > >> > Chuck >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> FWIW I'm pretty strongly in favor of GHI for NumPy/SciPy (I am going >> to get involved in NumPy dev eventually, promise). While warty in some >> of the places already mentioned, I have found it to be very >> low-friction and low-annoyance in my own dev process (nearing 1000 >> issues closed in the last year in pandas). But there are fewer cooks >> in the kitchen with pandas so perhaps this experience wouldn't be >> identical with NumPy. The biggest benefit I've seen is community >> involvement that you really wouldn't see if I were using a Trac or >> something else hosted elsewhere. Users are on GitHub and it for some >> reason gives people a feeling of engagement in the open source process >> that I don't see anywhere else. > > > Feels like it's time to make a decision on this. > > I see no blocking objections against Github, so perhaps we should give it > a go. The attachment issue for data files can be solved by relocating those > to a server we still administer. Trac is currently annoying me also, > because I need to change the milestone of ~50 tickets and have no good way > of doing it. So nothing's perfect. Github's hosting service, possibly more > user involvement and centralizing all our tools there may be enough to > outweigh the limitations of GHI. > > Proposal: move NumPy tickets to Github. > > +1. The move needs some planning. 1. Document workflow. 2. Change link and put explanation on the numpy bug report page. 3. Notify current registered trac users. 4. Import current tickets. The last is going to take significant effort if we need to label the issues and go through the attachments. We also need a 'moved' resolution to help with that. Some work on a script automating the process would pay off there. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat May 5 16:19:44 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 5 May 2012 15:19:44 -0500 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> Message-ID: <21065CEC-A457-406D-96F9-BC68D523DA6A@continuum.io> On May 5, 2012, at 2:28 PM, Ralf Gommers wrote: > > > On Wed, May 2, 2012 at 11:25 PM, Wes McKinney wrote: > On Wed, May 2, 2012 at 9:48 AM, Charles R Harris > wrote: > > > > > > On Tue, May 1, 2012 at 11:47 PM, Ralf Gommers > > wrote: > >> > >> > >> > >> On Wed, May 2, 2012 at 1:48 AM, Pauli Virtanen wrote: > >>> > >>> 01.05.2012 21:34, Ralf Gommers kirjoitti: > >>> [clip] > >>> > At this point it's probably good to look again at the problems we want > >>> > to solve: > >>> > 1. responsive user interface (must absolutely have) > >>> > >>> Now that it comes too late: with some luck, I've possibly hit on what > >>> was ailing the Tracs (max_diff_bytes configured too large). Let's see if > >>> things work better from now on... > >> > >> > >> That's amazing - not only does it not give errors anymore, it's also an > >> order of magnitude faster. > >> > > > > So maybe we could just stick with trac. Performance was really the sticking > > point. > > > > Chuck > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > FWIW I'm pretty strongly in favor of GHI for NumPy/SciPy (I am going > to get involved in NumPy dev eventually, promise). While warty in some > of the places already mentioned, I have found it to be very > low-friction and low-annoyance in my own dev process (nearing 1000 > issues closed in the last year in pandas). But there are fewer cooks > in the kitchen with pandas so perhaps this experience wouldn't be > identical with NumPy. The biggest benefit I've seen is community > involvement that you really wouldn't see if I were using a Trac or > something else hosted elsewhere. Users are on GitHub and it for some > reason gives people a feeling of engagement in the open source process > that I don't see anywhere else. > > Feels like it's time to make a decision on this. > > I see no blocking objections against Github, so perhaps we should give it a go. The attachment issue for data files can be solved by relocating those to a server we still administer. Trac is currently annoying me also, because I need to change the milestone of ~50 tickets and have no good way of doing it. So nothing's perfect. Github's hosting service, possibly more user involvement and centralizing all our tools there may be enough to outweigh the limitations of GHI. > > Proposal: move NumPy tickets to Github. +1 The process does need planning. We don't need to rush, but it would be great to get it done by end of June. To Charles' list and Ralf's suggestions, I would add setting up a server that can relay pull requests to the mailing list. NumFocus can setup that server and provide login permissions to those needing to administer it. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 5 16:21:14 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 14:21:14 -0600 Subject: [Numpy-discussion] Is NumpyDotNet (aka numpy-refactor) likely to be merged into the mainline? In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 1:33 PM, Ralf Gommers wrote: > > > On Tue, May 1, 2012 at 9:24 PM, Seth Nickell wrote: > >> With a little work, I think numpy/scipy could be very useful to those >> of us who have to program on .NET for one reason or another, but >> 64-bit is currently not supported (at least, not as released). >> >> I'm considering working out 64-bit support, but it appears to me like >> the numpy-refactor repository isn't on a path to merging with the >> mainline, and is likely to bit-rot (if it hasn't already). Is anyone >> working on this, or is NumpyDotNet 'resting' at the moment, so to >> speak? Its sort of pointless to work on a dead-end branch ;-) >> > > It's my impression that that would indeed be pointless - at least for the > near future. If there are still plans to work on the refactor branch, > someone please correct me. > > I think it is pining for the fjords. It was developed in parallel with numpy and fell way, way behind. We need a good incremental path forward where the two branches are kept much closer together if we do this again. If you are motivated to work over the long term, we eventually want to have a 'base' numpy that can be more easily ported to IronPython and something like the .NET interface is going to be needed if that port is made. There is probably a lot of useful work in there. I'm pretty unclear on the details of these future developments. Revisiting the documentation of the fundamental changes would be a good thing to do so we can discuss what parts to pull into numpy or reimplement. Is IronPython development still active? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat May 5 16:29:42 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 5 May 2012 15:29:42 -0500 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: <8E720DB8-D978-473A-ADD5-44F490213526@continuum.io> > > Would you be interested in carrying Martin's package forward? I'm not opposed to having quaternions in numpy/scipy but there needs to be someone to push it and deal with problems if they come up. Martin's package disappeared in large part because Martin disappeared. I'd also like to hear from Mark about other aspects, as there was also a simple rational user type proposed that we were looking to put in as an extension 'test' type. IIRC, there were some needed fixes to Numpy, some of which were postponed in favor of larger changes. User types is one of the things we want ot get fixed up. > > I kind of like the idea of there being a package, separate from numpy, which collects these dtypes together. To start, the quaternion and the rational type could go in it, and eventually I think it would be nice to move datetime64 there as well. Maybe it could be called numpy-dtypes, or would a more creative name be better? A extended dtype package would be a very good idea, and a great place for quaternions. But, I agree with Chuck that datetime64 is too fundamental to be pushed to a separate package. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat May 5 16:43:13 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 5 May 2012 15:43:13 -0500 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 1:06 PM, Charles R Harris wrote: > > On Sat, May 5, 2012 at 11:19 AM, Mark Wiebe wrote: > >> On Sat, May 5, 2012 at 11:55 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >>> >>> On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft < >>> aldcroft at head.cfa.harvard.edu> wrote: >>> >>>> On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell >>>> wrote: >>>> > Hi Chuck, >>>> > >>>> > thanks for the prompt reply. I as curious because because >>>> > someone was interested in adding >>>> http://pypi.python.org/pypi/Quaternion >>>> > to EPD, but Martin and Mark's implementation of quaternions >>>> > looks much better. >>>> >>>> Hi - >>>> >>>> I'm a co-author of the above mentioned Quaternion package. I agree >>>> the numpy_quaternion version would be better, but if there is no >>>> expectation that it will move forward I can offer to improve our >>>> Quaternion. A few months ago I played around with making it accept >>>> arbitrary array inputs (with similar shape of course) to essentially >>>> vectorize the transformations. We never got around to putting this in >>>> a release because of a perceived lack of interest / priorities... If >>>> this would be useful then let me know. >>>> >>>> >>> Would you be interested in carrying Martin's package forward? I'm not >>> opposed to having quaternions in numpy/scipy but there needs to be someone >>> to push it and deal with problems if they come up. Martin's package >>> disappeared in large part because Martin disappeared. I'd also like to hear >>> from Mark about other aspects, as there was also a simple rational user >>> type proposed that we were looking to put in as an extension 'test' type. >>> IIRC, there were some needed fixes to Numpy, some of which were postponed >>> in favor of larger changes. User types is one of the things we want ot get >>> fixed up. >>> >> >> I kind of like the idea of there being a package, separate from numpy, >> which collects these dtypes together. To start, the quaternion and the >> rational type could go in it, and eventually I think it would be nice to >> move datetime64 there as well. Maybe it could be called numpy-dtypes, or >> would a more creative name be better? >> > > I'm trying to think about how that would be organized. We could create a > new repository, numpy-user-types (numpy-extension-types), under the numpy > umbrella. It would need documents and such as well as someone interested in > maintaining it and making releases. A branch in the numpy repository > wouldn't work since we would want to rebase it regularly. It could maybe go > in scipy but a new package would need to be created there and it feels too > distant from numpy for such basic types as datetime. > > Do you have thoughts about the details? > Another repository under the numpy umbrella would best fit what I'm imagining, yes. I would imagine it as a package of additional types that aren't the core ones, but that many people would probably want to install. It would also be a way to continually exercise the type extension system, to make sure it doesn't break. It couldn't be a branch of numpy, rather a collection of additional dtypes and associated useful functions. -Mark > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat May 5 16:53:26 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 5 May 2012 22:53:26 +0200 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <21065CEC-A457-406D-96F9-BC68D523DA6A@continuum.io> References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> <21065CEC-A457-406D-96F9-BC68D523DA6A@continuum.io> Message-ID: On Sat, May 5, 2012 at 10:19 PM, Travis Oliphant wrote: > > On May 5, 2012, at 2:28 PM, Ralf Gommers wrote: > > > > On Wed, May 2, 2012 at 11:25 PM, Wes McKinney wrote: > >> On Wed, May 2, 2012 at 9:48 AM, Charles R Harris >> wrote: >> > >> > >> > On Tue, May 1, 2012 at 11:47 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> >> > wrote: >> >> >> >> >> >> >> >> On Wed, May 2, 2012 at 1:48 AM, Pauli Virtanen wrote: >> >>> >> >>> 01.05.2012 21:34, Ralf Gommers kirjoitti: >> >>> [clip] >> >>> > At this point it's probably good to look again at the problems we >> want >> >>> > to solve: >> >>> > 1. responsive user interface (must absolutely have) >> >>> >> >>> Now that it comes too late: with some luck, I've possibly hit on what >> >>> was ailing the Tracs (max_diff_bytes configured too large). Let's see >> if >> >>> things work better from now on... >> >> >> >> >> >> That's amazing - not only does it not give errors anymore, it's also an >> >> order of magnitude faster. >> >> >> > >> > So maybe we could just stick with trac. Performance was really the >> sticking >> > point. >> > >> > Chuck >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> FWIW I'm pretty strongly in favor of GHI for NumPy/SciPy (I am going >> to get involved in NumPy dev eventually, promise). While warty in some >> of the places already mentioned, I have found it to be very >> low-friction and low-annoyance in my own dev process (nearing 1000 >> issues closed in the last year in pandas). But there are fewer cooks >> in the kitchen with pandas so perhaps this experience wouldn't be >> identical with NumPy. The biggest benefit I've seen is community >> involvement that you really wouldn't see if I were using a Trac or >> something else hosted elsewhere. Users are on GitHub and it for some >> reason gives people a feeling of engagement in the open source process >> that I don't see anywhere else. > > > Feels like it's time to make a decision on this. > > I see no blocking objections against Github, so perhaps we should give it > a go. The attachment issue for data files can be solved by relocating those > to a server we still administer. Trac is currently annoying me also, > because I need to change the milestone of ~50 tickets and have no good way > of doing it. So nothing's perfect. Github's hosting service, possibly more > user involvement and centralizing all our tools there may be enough to > outweigh the limitations of GHI. > > > Proposal: move NumPy tickets to Github. > > > +1 > > The process does need planning. We don't need to rush, but it would be > great to get it done by end of June. To Charles' list and Ralf's > suggestions, I would add setting up a server that can relay pull requests > to the mailing list. > > Don't know if you saw this, but it looks like Pauli is pretty far along in fixing this problem: http://thread.gmane.org/gmane.comp.python.numeric.general/49551/focus=49744 Ralf > NumFocus can setup that server and provide login permissions to those > needing to administer it. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Sat May 5 16:56:27 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sat, 5 May 2012 22:56:27 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: Hi, I'm getting a couple of errors when testing. System: Arch Linux (updated today) Python 3.2.3 gcc 4.7.0 (Anything else?) I think that this error: AssertionError: selectedrealkind(19): expected -1 but got 16 is due to the fact that newer versions of gfortran actually supports precision this high (quad precision). Cheers Paul python -c 'import numpy;numpy.test("full")' Running unit tests for numpy NumPy version 1.6.1 NumPy is installed in /usr/lib/python3.2/site-packages/numpy Python version 3.2.3 (default, Apr 23 2012, 23:35:30) [GCC 4.7.0 20120414 (prerelease)] nose version 1.1.2 ....S.................................................................................................................................................................................................................................................................................S......................................................................................................................................................................................................................................................................................................................................................................................................................SSS...........................................................................................K............................................................................K.............................................................................................................K.................................................................................................K......................K..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../usr/lib/python3.2/site-packages/numpy/lib/format.py:575: ResourceWarning: unclosed file <_io.BufferedReader name='/tmp/tmpmkxhkq'> mode=mode, offset=offset) ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................/usr/lib/python3.2/subprocess.py:471: ResourceWarning: unclosed file <_io.FileIO name=3 mode='rb'> return Popen(*popenargs, **kwargs).wait() /usr/lib/python3.2/subprocess.py:471: ResourceWarning: unclosed file <_io.FileIO name=7 mode='rb'> return Popen(*popenargs, **kwargs).wait() ..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F................................................................................................................... ====================================================================== FAIL: test_kind.TestKind.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/lib/python3.2/site-packages/numpy/f2py/tests/test_kind.py", line 30, in test_all 'selectedrealkind(%s): expected %r but got %r' % (i, selected_real_kind(i), selectedrealkind(i))) File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line 34, in assert_ raise AssertionError(msg) AssertionError: selectedrealkind(19): expected -1 but got 16 ====================================================================== FAIL: test_pareto (test_random.TestRandomDist) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python3.2/site-packages/numpy/random/tests/test_random.py", line 313, in test_pareto np.testing.assert_array_almost_equal(actual, desired, decimal=15) File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line 800, in assert_array_almost_equal header=('Arrays are not almost equal to %d decimals' % decimal)) File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal to 15 decimals (mismatch 16.66666666666667%) x: array([[ 2.46852460e+03, 1.41286881e+03], [ 5.28287797e+07, 6.57720981e+07], [ 1.40840323e+02, 1.98390255e+05]]) y: array([[ 2.46852460e+03, 1.41286881e+03], [ 5.28287797e+07, 6.57720981e+07], [ 1.40840323e+02, 1.98390255e+05]]) ---------------------------------------------------------------------- Ran 3559 tests in 63.970s FAILED (KNOWNFAIL=5, SKIP=5, failures=2) On Sat, May 5, 2012 at 8:45 PM, Charles R Harris wrote: > > > On Sat, May 5, 2012 at 12:15 PM, Ralf Gommers > wrote: >> >> Hi, >> >> I'm pleased to announce the availability of the first release candidate of >> NumPy 1.6.2.? This is a maintenance release. Due to the delay of the NumPy >> 1.7.0, this release contains far more fixes than a regular NumPy bugfix >> release.? It also includes a number of documentation and build improvements. >> >> Sources and binary installers can be found at >> https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ >> >> Please test this release and report any issues on the numpy-discussion >> mailing list. >> >> Cheers, >> Ralf >> From charlesr.harris at gmail.com Sat May 5 17:06:31 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 15:06:31 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes < paul.anton.letnes at gmail.com> wrote: > Hi, > > I'm getting a couple of errors when testing. System: > Arch Linux (updated today) > Python 3.2.3 > gcc 4.7.0 > (Anything else?) > > I think that this error: > AssertionError: selectedrealkind(19): expected -1 but got 16 > is due to the fact that newer versions of gfortran actually supports > precision this high (quad precision). > > Cheers > Paul > > > python -c 'import numpy;numpy.test("full")' > Running unit tests for numpy > NumPy version 1.6.1 > NumPy is installed in /usr/lib/python3.2/site-packages/numpy > Python version 3.2.3 (default, Apr 23 2012, 23:35:30) [GCC 4.7.0 > 20120414 (prerelease)] > nose version 1.1.2 > > ....S.................................................................................................................................................................................................................................................................................S......................................................................................................................................................................................................................................................................................................................................................................................................................SSS...........................................................................................K............................................................................K.............................................................................................................K.................................................................................................K......................K..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../usr/lib/python3.2/site-packages/numpy/lib/format.py:575: > ResourceWarning: unclosed file <_io.BufferedReader > name='/tmp/tmpmkxhkq'> > mode=mode, offset=offset) > > ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................/usr/lib/python3.2/subprocess.py:471: > ResourceWarning: unclosed file <_io.FileIO name=3 mode='rb'> > return Popen(*popenargs, **kwargs).wait() > /usr/lib/python3.2/subprocess.py:471: ResourceWarning: unclosed file > <_io.FileIO name=7 mode='rb'> > return Popen(*popenargs, **kwargs).wait() > > ..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F................................................................................................................... > ====================================================================== > FAIL: test_kind.TestKind.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in runTest > self.test(*self.arg) > File "/usr/lib/python3.2/site-packages/numpy/f2py/tests/test_kind.py", > line 30, in test_all > 'selectedrealkind(%s): expected %r but got %r' % (i, > selected_real_kind(i), selectedrealkind(i))) > File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line > 34, in assert_ > raise AssertionError(msg) > AssertionError: selectedrealkind(19): expected -1 but got 16 > This should have been fixed. Hmm... > > ====================================================================== > FAIL: test_pareto (test_random.TestRandomDist) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python3.2/site-packages/numpy/random/tests/test_random.py", > line 313, in test_pareto > np.testing.assert_array_almost_equal(actual, desired, decimal=15) > File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line > 800, in assert_array_almost_equal > header=('Arrays are not almost equal to %d decimals' % decimal)) > File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line > 636, in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not almost equal to 15 decimals > > (mismatch 16.66666666666667%) > x: array([[ 2.46852460e+03, 1.41286881e+03], > [ 5.28287797e+07, 6.57720981e+07], > [ 1.40840323e+02, 1.98390255e+05]]) > y: array([[ 2.46852460e+03, 1.41286881e+03], > [ 5.28287797e+07, 6.57720981e+07], > [ 1.40840323e+02, 1.98390255e+05]]) > > I can't think of anything that would affect this apart from the compiler version. Perhaps the precision needs to be backed off a bit. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgohlke at uci.edu Sat May 5 17:51:56 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Sat, 05 May 2012 14:51:56 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: <4FA5A0FC.9000907@uci.edu> On 5/5/2012 11:15 AM, Ralf Gommers wrote: > Hi, > > I'm pleased to announce the availability of the first release candidate > of NumPy 1.6.2. This is a maintenance release. Due to the delay of the > NumPy 1.7.0, this release contains far more fixes than a regular NumPy > bugfix release. It also includes a number of documentation and build > improvements. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > Please test this release and report any issues on the numpy-discussion > mailing list. > > Cheers, > Ralf > Thank you and Chuck! The rc builds and tests OK on Windows with msvc9/MKL. There are also no apparent problems with scipy, matplotlib and a number of other packages. One pygame 1.9.2pre test fails; I'll look into it. Christoph From charlesr.harris at gmail.com Sat May 5 18:12:56 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 16:12:56 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes < paul.anton.letnes at gmail.com> wrote: > Hi, > > I'm getting a couple of errors when testing. System: > Arch Linux (updated today) > Python 3.2.3 > gcc 4.7.0 > (Anything else?) > > I think that this error: > AssertionError: selectedrealkind(19): expected -1 but got 16 > is due to the fact that newer versions of gfortran actually supports > precision this high (quad precision). > > Yes, but it should be fixed. I can't duplicate this here with a fresh checkout of the branch. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgohlke at uci.edu Sat May 5 20:21:01 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Sat, 05 May 2012 17:21:01 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: <4FA5A0FC.9000907@uci.edu> References: <4FA5A0FC.9000907@uci.edu> Message-ID: <4FA5C3ED.8050704@uci.edu> On 5/5/2012 2:51 PM, Christoph Gohlke wrote: > > > On 5/5/2012 11:15 AM, Ralf Gommers wrote: >> Hi, >> >> I'm pleased to announce the availability of the first release candidate >> of NumPy 1.6.2. This is a maintenance release. Due to the delay of the >> NumPy 1.7.0, this release contains far more fixes than a regular NumPy >> bugfix release. It also includes a number of documentation and build >> improvements. >> >> Sources and binary installers can be found at >> https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ >> >> Please test this release and report any issues on the numpy-discussion >> mailing list. >> >> Cheers, >> Ralf >> > > Thank you and Chuck! The rc builds and tests OK on Windows with > msvc9/MKL. There are also no apparent problems with scipy, matplotlib > and a number of other packages. One pygame 1.9.2pre test fails; I'll > look into it. > Just to follow up on the pygame test error: it is due to pygame doing incomplete exception handling when checking for invalid dtypes. Easy to fix. Christoph From pav at iki.fi Sat May 5 21:13:30 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 06 May 2012 03:13:30 +0200 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> <21065CEC-A457-406D-96F9-BC68D523DA6A@continuum.io> Message-ID: 05.05.2012 22:53, Ralf Gommers kirjoitti: [clip] > would be great to get it done by end of June. To Charles' list > and Ralf's suggestions, I would add setting up a server that can > relay pull requests to the mailing list. > > Don't know if you saw this, but it looks like Pauli is pretty far along > in fixing this problem: > http://thread.gmane.org/gmane.comp.python.numeric.general/49551/focus=49744 The only thing missing is really only the server configuration --- new.scipy.org could in principle do that, but its mail system seems to be configured so that it cannot send mail to the MLs. So, someone with roots on the machine needs to step up. Pauli From aldcroft at head.cfa.harvard.edu Sat May 5 21:15:34 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Sat, 5 May 2012 21:15:34 -0400 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 12:55 PM, Charles R Harris wrote: > > > On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft > wrote: >> >> On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell >> wrote: >> > Hi Chuck, >> > >> > thanks for the prompt reply. ?I as curious because because >> > someone was interested in adding http://pypi.python.org/pypi/Quaternion >> > to EPD, but Martin and Mark's implementation of quaternions >> > looks much better. >> >> Hi - >> >> I'm a co-author of the above mentioned Quaternion package. ?I agree >> the numpy_quaternion version would be better, but if there is no >> expectation that it will move forward I can offer to improve our >> Quaternion. ?A few months ago I played around with making it accept >> arbitrary array inputs (with similar shape of course) to essentially >> vectorize the transformations. ?We never got around to putting this in >> a release because of a perceived lack of interest / priorities... If >> this would be useful then let me know. >> > > Would you be interested in carrying Martin's package forward? I'm not > opposed to having quaternions in numpy/scipy but there needs to be someone > to push it and deal with problems if they come up. Martin's package > disappeared in large part because Martin disappeared. I'd also like to hear > from Mark about other aspects, as there was also a simple rational user type > proposed that we were looking to put in as an extension 'test' type. IIRC, > there were some needed fixes to Numpy, some of which were postponed in favor > of larger changes. User types is one of the things we want ot get fixed up. It would be great to have a quaternion dtype available in numpy, so I would be interested in carrying this package if nobody else steps forward. I don't have any experience with numpy internals, but it looks like most the heavy lifting is done already. On a related note the AstroPy project has been discussing a time class suitable for astronomy (with different conversions, time systems, an option to use 128-bit precision, etc). We have recently talked about a numpy dtype analogous to datetime64. This might be an opportunity to understand a bit the mechanics of making a new dtype. Cheers, Tom > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ben.root at ou.edu Sat May 5 22:50:37 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 5 May 2012 22:50:37 -0400 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> <21065CEC-A457-406D-96F9-BC68D523DA6A@continuum.io> Message-ID: On Saturday, May 5, 2012, Pauli Virtanen wrote: > 05.05.2012 22:53, Ralf Gommers kirjoitti: > [clip] > > would be great to get it done by end of June. To Charles' list > > and Ralf's suggestions, I would add setting up a server that can > > relay pull requests to the mailing list. > > > > Don't know if you saw this, but it looks like Pauli is pretty far along > > in fixing this problem: > > > http://thread.gmane.org/gmane.comp.python.numeric.general/49551/focus=49744 > > The only thing missing is really only the server configuration --- > new.scipy.org could in principle do that, but its mail system seems to > be configured so that it cannot send mail to the MLs. So, someone with > roots on the machine needs to step up. > > Pauli > > Just a quick lesson from matplotlib's migration of sourceforge bugs to github issues. Darren Dale did an excellent job with only a few hitches. The key one is that *every* issue migrated spawns a new email. This got old very fast. Second, because Darren did the migration, he became author for every single issue. He then got every single status change of every issue that we triaged the following few weeks. We don't hear much from Darren these days... I suspect the men in the white coats took him away... Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 5 23:19:21 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 21:19:21 -0600 Subject: [Numpy-discussion] Extension types repository Message-ID: All, Tom Aldcroft volunteered to bring quaternions into numpy. The proposal is to set up a separate repository under the numpy name on github, npydtypes or some such, and bring in Martin Ling's quaternion extension dtype as a start. Other extension types that would reside in the repository would be the simple rational type, and perhaps some specialized astronomical time types. So here is the proposal. 1. Make Tom a member of the numpy organization on github. 2. Set up an extension dtypes repository in github.com/numpy Other proposals for the name are welcome. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkerzendorf at gmail.com Sat May 5 23:23:59 2012 From: wkerzendorf at gmail.com (Wolfgang Kerzendorf) Date: Sat, 5 May 2012 23:23:59 -0400 Subject: [Numpy-discussion] sparse array data In-Reply-To: References: <2A5D98CB-5E44-432C-95D3-2FF9CB9B1A0B@gmail.com> <4FA19EDB.5040702@continuum.io> Message-ID: <264EABFB-2FE3-405B-B9BC-A12AEFCF5175@gmail.com> Hey guys, Thanks for all the suggestions. They seem already relatively involved, but I'll have a look at the table implementation. That seems to be the easiest of them all. Cheers Wolfgang On 2012-05-03, at 9:39 AM, Charles R Harris wrote: > > > On Thu, May 3, 2012 at 3:41 AM, Nathaniel Smith wrote: > On Thu, May 3, 2012 at 4:44 AM, Charles R Harris > wrote: > > > > > > On Wed, May 2, 2012 at 3:20 PM, Nathaniel Smith wrote: > >> This coordinate format is also what's used by the MATLAB Tensor > >> Toolbox. They have a paper justifying this choice and describing some > >> tricks for how to work with them: > >> http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1 > >> (Spoiler: you use a lot of sort operations. Conveniently, timsort > >> appears to be perfectly adapted for their algorithmic requirements.) > >> > > > > Timsort probably isn't the best choice here, it is optimized for python > > lists of python objects where there is at least one level of indirection and > > compares are expensive, even more expensive for compound objects. If the > > coordinates are stored in numpy structured arrays an ordinary sort is likely > > to be faster. Lexsort might even be faster as it could access aligned > > integer data and not have to move lists of indexes around. > > To be clear, I don't mean Timsort-the-implementation, I mean > Timsort-the-algorithm (which is now also the default sorting algorithm > in Java). That said, it may well be optimized for expensive compares > and something like a radix sort would be even better. > > > Java uses Timsort for sorting object arrays (pointers) and dual pivot quicksort for sorting arrays of native types, ints and such. Timsort is very good for almost sorted arrays and the mixed algorithms are becoming popular, i.e., introsort and recent updates to the dual pivot sort that also look for runs. One of the reasons compares can be expensive for arrays of pointers to objects is that the objects can be located all over memory, which blows the cache. > > There are also a few mods to the numpy quicksort that might speed things up a bit more for common cases where there are a lot of repeated elements. > > In these sparse tensor algorithms, we often need to sort by one > coordinate axis, and then sort by another (a "corner turn"). The > reason Timsort seems appealing is that (1) it goes faster than O(n log > n) when there is existing structure in the data being sorted, (2) > because it's a stable sort, sorting on one axis then sorting on > another will leave lots of that structure behind to be exploited > later. So one can expect it to hit its happy case relatively often. > > > Yes, that's why we have mergesort. An optimistic version making some use of runs might make it faster. We do have object arrays and no type specialized sort for them, so bringing Timsort in for those could be useful. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 5 23:28:18 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 May 2012 21:28:18 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> <21065CEC-A457-406D-96F9-BC68D523DA6A@continuum.io> Message-ID: On Sat, May 5, 2012 at 8:50 PM, Benjamin Root wrote: > > > On Saturday, May 5, 2012, Pauli Virtanen wrote: > >> 05.05.2012 22:53, Ralf Gommers kirjoitti: >> [clip] >> > would be great to get it done by end of June. To Charles' list >> > and Ralf's suggestions, I would add setting up a server that can >> > relay pull requests to the mailing list. >> > >> > Don't know if you saw this, but it looks like Pauli is pretty far along >> > in fixing this problem: >> > >> http://thread.gmane.org/gmane.comp.python.numeric.general/49551/focus=49744 >> >> The only thing missing is really only the server configuration --- >> new.scipy.org could in principle do that, but its mail system seems to >> be configured so that it cannot send mail to the MLs. So, someone with >> roots on the machine needs to step up. >> >> Pauli >> >> > Just a quick lesson from matplotlib's migration of sourceforge bugs to > github issues. Darren Dale did an excellent job with only a few hitches. > The key one is that *every* issue migrated spawns a new email. This got old > very fast. Second, because Darren did the migration, he became author for > every single issue. He then got every single status change of every issue > that we triaged the following few weeks. > > We don't hear much from Darren these days... I suspect the men in the > white coats took him away... > > Uh oh. We are short on developers as is... Which brings up a question, do people need a github account to open an issue? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat May 5 23:44:33 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 5 May 2012 22:44:33 -0500 Subject: [Numpy-discussion] Extension types repository In-Reply-To: References: Message-ID: +1 Travis -- Travis Oliphant (on a mobile) 512-826-7480 On May 5, 2012, at 10:19 PM, Charles R Harris wrote: > All, > > Tom Aldcroft volunteered to bring quaternions into numpy. The proposal is to set up a separate repository under the numpy name on github, npydtypes or some such, and bring in Martin Ling's quaternion extension dtype as a start. Other extension types that would reside in the repository would be the simple rational type, and perhaps some specialized astronomical time types. So here is the proposal. > > Make Tom a member of the numpy organization on github. > Set up an extension dtypes repository in github.com/numpy > Other proposals for the name are welcome. > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sun May 6 00:00:16 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 6 May 2012 00:00:16 -0400 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> <21065CEC-A457-406D-96F9-BC68D523DA6A@continuum.io> Message-ID: On Saturday, May 5, 2012, Charles R Harris wrote: > > > On Sat, May 5, 2012 at 8:50 PM, Benjamin Root > > wrote: > >> >> >> On Saturday, May 5, 2012, Pauli Virtanen wrote: >> >>> 05.05.2012 22:53, Ralf Gommers kirjoitti: >>> [clip] >>> > would be great to get it done by end of June. To Charles' list >>> > and Ralf's suggestions, I would add setting up a server that can >>> > relay pull requests to the mailing list. >>> > >>> > Don't know if you saw this, but it looks like Pauli is pretty far along >>> > in fixing this problem: >>> > >>> http://thread.gmane.org/gmane.comp.python.numeric.general/49551/focus=49744 >>> >>> The only thing missing is really only the server configuration --- >>> new.scipy.org could in principle do that, but its mail system seems to >>> be configured so that it cannot send mail to the MLs. So, someone with >>> roots on the machine needs to step up. >>> >>> Pauli >>> >>> >> Just a quick lesson from matplotlib's migration of sourceforge bugs to >> github issues. Darren Dale did an excellent job with only a few hitches. >> The key one is that *every* issue migrated spawns a new email. This got old >> very fast. Second, because Darren did the migration, he became author for >> every single issue. He then got every single status change of every issue >> that we triaged the following few weeks. >> >> We don't hear much from Darren these days... I suspect the men in the >> white coats took him away... >> >> > Uh oh. We are short on developers as is... Which brings up a question, do > people need a github account to open an issue? > > Chuck > Last time I checked, yes. But this hasn't seemed to slow things down for us. Ben Root P.S. - there are probably ways around the issues I described. I just mentioning them so that whoever prepares the migration could look out for those problems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Sun May 6 02:16:38 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 6 May 2012 08:16:38 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: <53503912-3DD7-4DF1-B679-8CCA385674EE@gmail.com> All tests for 1.6.2rc1 pass on Mac OS X 10.7.3 python 2.7.2 gcc 4.2 (Apple) Great! Paul On 6. mai 2012, at 00:12, Charles R Harris wrote: > > > On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes wrote: > Hi, > > I'm getting a couple of errors when testing. System: > Arch Linux (updated today) > Python 3.2.3 > gcc 4.7.0 > (Anything else?) > > I think that this error: > AssertionError: selectedrealkind(19): expected -1 but got 16 > is due to the fact that newer versions of gfortran actually supports > precision this high (quad precision). > > > Yes, but it should be fixed. I can't duplicate this here with a fresh checkout of the branch. > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cournape at gmail.com Sun May 6 03:56:20 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 6 May 2012 08:56:20 +0100 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 9:43 PM, Mark Wiebe wrote: > On Sat, May 5, 2012 at 1:06 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: >> >> On Sat, May 5, 2012 at 11:19 AM, Mark Wiebe wrote: >> >>> On Sat, May 5, 2012 at 11:55 AM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>>> >>>> On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft < >>>> aldcroft at head.cfa.harvard.edu> wrote: >>>> >>>>> On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell >>>>> wrote: >>>>> > Hi Chuck, >>>>> > >>>>> > thanks for the prompt reply. I as curious because because >>>>> > someone was interested in adding >>>>> http://pypi.python.org/pypi/Quaternion >>>>> > to EPD, but Martin and Mark's implementation of quaternions >>>>> > looks much better. >>>>> >>>>> Hi - >>>>> >>>>> I'm a co-author of the above mentioned Quaternion package. I agree >>>>> the numpy_quaternion version would be better, but if there is no >>>>> expectation that it will move forward I can offer to improve our >>>>> Quaternion. A few months ago I played around with making it accept >>>>> arbitrary array inputs (with similar shape of course) to essentially >>>>> vectorize the transformations. We never got around to putting this in >>>>> a release because of a perceived lack of interest / priorities... If >>>>> this would be useful then let me know. >>>>> >>>>> >>>> Would you be interested in carrying Martin's package forward? I'm not >>>> opposed to having quaternions in numpy/scipy but there needs to be someone >>>> to push it and deal with problems if they come up. Martin's package >>>> disappeared in large part because Martin disappeared. I'd also like to hear >>>> from Mark about other aspects, as there was also a simple rational user >>>> type proposed that we were looking to put in as an extension 'test' type. >>>> IIRC, there were some needed fixes to Numpy, some of which were postponed >>>> in favor of larger changes. User types is one of the things we want ot get >>>> fixed up. >>>> >>> >>> I kind of like the idea of there being a package, separate from numpy, >>> which collects these dtypes together. To start, the quaternion and the >>> rational type could go in it, and eventually I think it would be nice to >>> move datetime64 there as well. Maybe it could be called numpy-dtypes, or >>> would a more creative name be better? >>> >> >> I'm trying to think about how that would be organized. We could create a >> new repository, numpy-user-types (numpy-extension-types), under the numpy >> umbrella. It would need documents and such as well as someone interested in >> maintaining it and making releases. A branch in the numpy repository >> wouldn't work since we would want to rebase it regularly. It could maybe go >> in scipy but a new package would need to be created there and it feels too >> distant from numpy for such basic types as datetime. >> >> Do you have thoughts about the details? >> > > Another repository under the numpy umbrella would best fit what I'm > imagining, yes. I would imagine it as a package of additional types that > aren't the core ones, but that many people would probably want to install. > It would also be a way to continually exercise the type extension system, > to make sure it doesn't break. It couldn't be a branch of numpy, rather a > collection of additional dtypes and associated useful functions. > I would be in favor of this as well. We could start the repository by having one "trivial" dtype that would serve as an example. That's something I have been interested in, I can lock a couple of hours / week to help this with. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun May 6 04:38:17 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 6 May 2012 10:38:17 +0200 Subject: [Numpy-discussion] Extension types repository In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 5:44 AM, Travis Oliphant wrote: > +1 > > Travis > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On May 5, 2012, at 10:19 PM, Charles R Harris > wrote: > > All, > > Tom Aldcroft volunteered to bring quaternions into numpy. The proposal is > to set up a separate repository under the numpy name on github, npydtypes > or some such, and bring in Martin Ling's quaternion extension dtype as a > start. Other extension types that would reside in the repository would be > the simple rational type, and perhaps some specialized astronomical time > types. So here is the proposal. > > +1 > > 1. Make Tom a member of the numpy organization on github. > > Would need a new team to be set up too. Travis is the only one who can do that. Ralf > 1. Set up an extension dtypes repository in github.com/numpy > > Other proposals for the name are welcome. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldcroft at head.cfa.harvard.edu Sun May 6 08:02:40 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Sun, 6 May 2012 08:02:40 -0400 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 3:56 AM, David Cournapeau wrote: > > > On Sat, May 5, 2012 at 9:43 PM, Mark Wiebe wrote: >> >> On Sat, May 5, 2012 at 1:06 PM, Charles R Harris >> wrote: >>> >>> On Sat, May 5, 2012 at 11:19 AM, Mark Wiebe wrote: >>>> >>>> On Sat, May 5, 2012 at 11:55 AM, Charles R Harris >>>> wrote: >>>>> >>>>> On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft >>>>> wrote: >>>>>> >>>>>> On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell >>>>>> wrote: >>>>>> > Hi Chuck, >>>>>> > >>>>>> > thanks for the prompt reply. ?I as curious because because >>>>>> > someone was interested in adding >>>>>> > http://pypi.python.org/pypi/Quaternion >>>>>> > to EPD, but Martin and Mark's implementation of quaternions >>>>>> > looks much better. >>>>>> >>>>>> Hi - >>>>>> >>>>>> I'm a co-author of the above mentioned Quaternion package. ?I agree >>>>>> the numpy_quaternion version would be better, but if there is no >>>>>> expectation that it will move forward I can offer to improve our >>>>>> Quaternion. ?A few months ago I played around with making it accept >>>>>> arbitrary array inputs (with similar shape of course) to essentially >>>>>> vectorize the transformations. ?We never got around to putting this in >>>>>> a release because of a perceived lack of interest / priorities... If >>>>>> this would be useful then let me know. >>>>>> >>>>> >>>>> Would you be interested in carrying Martin's package forward? I'm not >>>>> opposed to having quaternions in numpy/scipy but there needs to be someone >>>>> to push it and deal with problems if they come up. Martin's package >>>>> disappeared in large part because Martin disappeared. I'd also like to hear >>>>> from Mark about other aspects, as there was also a simple rational user type >>>>> proposed that we were looking to put in as an extension 'test' type. IIRC, >>>>> there were some needed fixes to Numpy, some of which were postponed in favor >>>>> of larger changes. User types is one of the things we want ot get fixed up. >>>> >>>> >>>> I kind of like the idea of there being a package, separate from numpy, >>>> which collects these dtypes together. To start, the quaternion and the >>>> rational type could go in it, and eventually I think it would be nice to >>>> move datetime64 there as well. Maybe it could be called numpy-dtypes, or >>>> would a more creative name be better? >>> >>> >>> I'm trying to think about how that would be organized. We could create a >>> new repository, numpy-user-types (numpy-extension-types), under the numpy >>> umbrella. It would need documents and such as well as someone interested in >>> maintaining it and making releases. A branch in the numpy repository >>> wouldn't work since we would want to rebase it regularly. It could maybe go >>> in scipy but a new package would need to be created there and it feels too >>> distant from numpy for such basic types as datetime. >>> >>> Do you have thoughts about the details? >> >> >> Another repository under the numpy umbrella would best fit what I'm >> imagining, yes. I would imagine it as a package of additional types that >> aren't the core ones, but that many people would probably want to install. >> It would also be a way to continually exercise the type extension system, to >> make sure it doesn't break. It couldn't be a branch of numpy, rather a >> collection of additional dtypes and associated useful functions. > > > I would be in favor of this as well. We could start the repository by having > one "trivial" dtype that would serve as an example. That's something I have > been interested in, I can lock a couple of hours / week to help this with. > How about if I start by working on adding tests within numpy_quaternion, then this can be migrated into an extended dtypes package when it is set up. A nice "trivial" dtype example would be very useful, as I mentioned just last week our group was wondering how to make a new dtype. - Tom From charlesr.harris at gmail.com Sun May 6 12:05:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 May 2012 10:05:53 -0600 Subject: [Numpy-discussion] Extension types repository In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 2:38 AM, Ralf Gommers wrote: > > > On Sun, May 6, 2012 at 5:44 AM, Travis Oliphant wrote: > >> +1 >> >> Travis >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On May 5, 2012, at 10:19 PM, Charles R Harris >> wrote: >> >> All, >> >> Tom Aldcroft volunteered to bring quaternions into numpy. The proposal is >> to set up a separate repository under the numpy name on github, npydtypes >> or some such, and bring in Martin Ling's quaternion extension dtype as a >> start. Other extension types that would reside in the repository would be >> the simple rational type, and perhaps some specialized astronomical time >> types. So here is the proposal. >> >> +1 > >> >> 1. Make Tom a member of the numpy organization on github. >> >> Would need a new team to be set up too. Travis is the only one who can do > that. > > Yes, looks like Travis needs to create the new repository and add at least one core team member, who can then add others. I'd suggest numpy-extension-dtypes for the repository name, Tom is on github as taldcroft. Travis, it might be a good idea to add one more person with ownership permissions as a backup if that is possible. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun May 6 13:16:09 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 May 2012 11:16:09 -0600 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 6:02 AM, Tom Aldcroft wrote: > On Sun, May 6, 2012 at 3:56 AM, David Cournapeau > wrote: > > > > > > On Sat, May 5, 2012 at 9:43 PM, Mark Wiebe wrote: > >> > >> On Sat, May 5, 2012 at 1:06 PM, Charles R Harris > >> wrote: > >>> > >>> On Sat, May 5, 2012 at 11:19 AM, Mark Wiebe wrote: > >>>> > >>>> On Sat, May 5, 2012 at 11:55 AM, Charles R Harris > >>>> wrote: > >>>>> > >>>>> On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft > >>>>> wrote: > >>>>>> > >>>>>> On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell < > ischnell at enthought.com> > >>>>>> wrote: > >>>>>> > Hi Chuck, > >>>>>> > > >>>>>> > thanks for the prompt reply. I as curious because because > >>>>>> > someone was interested in adding > >>>>>> > http://pypi.python.org/pypi/Quaternion > >>>>>> > to EPD, but Martin and Mark's implementation of quaternions > >>>>>> > looks much better. > >>>>>> > >>>>>> Hi - > >>>>>> > >>>>>> I'm a co-author of the above mentioned Quaternion package. I agree > >>>>>> the numpy_quaternion version would be better, but if there is no > >>>>>> expectation that it will move forward I can offer to improve our > >>>>>> Quaternion. A few months ago I played around with making it accept > >>>>>> arbitrary array inputs (with similar shape of course) to essentially > >>>>>> vectorize the transformations. We never got around to putting this > in > >>>>>> a release because of a perceived lack of interest / priorities... If > >>>>>> this would be useful then let me know. > >>>>>> > >>>>> > >>>>> Would you be interested in carrying Martin's package forward? I'm not > >>>>> opposed to having quaternions in numpy/scipy but there needs to be > someone > >>>>> to push it and deal with problems if they come up. Martin's package > >>>>> disappeared in large part because Martin disappeared. I'd also like > to hear > >>>>> from Mark about other aspects, as there was also a simple rational > user type > >>>>> proposed that we were looking to put in as an extension 'test' type. > IIRC, > >>>>> there were some needed fixes to Numpy, some of which were postponed > in favor > >>>>> of larger changes. User types is one of the things we want ot get > fixed up. > >>>> > >>>> > >>>> I kind of like the idea of there being a package, separate from numpy, > >>>> which collects these dtypes together. To start, the quaternion and the > >>>> rational type could go in it, and eventually I think it would be nice > to > >>>> move datetime64 there as well. Maybe it could be called numpy-dtypes, > or > >>>> would a more creative name be better? > >>> > >>> > >>> I'm trying to think about how that would be organized. We could create > a > >>> new repository, numpy-user-types (numpy-extension-types), under the > numpy > >>> umbrella. It would need documents and such as well as someone > interested in > >>> maintaining it and making releases. A branch in the numpy repository > >>> wouldn't work since we would want to rebase it regularly. It could > maybe go > >>> in scipy but a new package would need to be created there and it feels > too > >>> distant from numpy for such basic types as datetime. > >>> > >>> Do you have thoughts about the details? > >> > >> > >> Another repository under the numpy umbrella would best fit what I'm > >> imagining, yes. I would imagine it as a package of additional types that > >> aren't the core ones, but that many people would probably want to > install. > >> It would also be a way to continually exercise the type extension > system, to > >> make sure it doesn't break. It couldn't be a branch of numpy, rather a > >> collection of additional dtypes and associated useful functions. > > > > > > I would be in favor of this as well. We could start the repository by > having > > one "trivial" dtype that would serve as an example. That's something I > have > > been interested in, I can lock a couple of hours / week to help this > with. > > > > How about if I start by working on adding tests within > numpy_quaternion, then this can be migrated into an extended dtypes > package when it is set up. > Sounds like a good start. You might want to ping Martin too. > > A nice "trivial" dtype example would be very useful, as I mentioned > just last week our group was wondering how to make a new dtype. > > There is the rational dtype . I expect there will be some interaction between numpy and the extension types as the bugs are worked out. Extension types for numpy haven't been much used. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrixhasu at gmail.com Sun May 6 13:40:54 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Sun, 6 May 2012 19:40:54 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers wrote: > I'm pleased to announce the availability of the first release candidate of > NumPy 1.6.2.? This is a maintenance release. Due to the delay of the NumPy > 1.7.0, this release contains far more fixes than a regular NumPy bugfix > release.? It also includes a number of documentation and build improvements. Great! > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > Please test this release and report any issues on the numpy-discussion > mailing list. i've just tested the debian package and it builds fine! The tests print some ResourceWarning with python3.2 but they all pass! Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From gael.varoquaux at normalesup.org Sun May 6 15:08:38 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 6 May 2012 21:08:38 +0200 Subject: [Numpy-discussion] ANN Scikit-learn 0.11-beta Message-ID: <20120506190838.GA9889@phare.normalesup.org> On behalf of our release manager, Andreas Mueller, and all the scikit-learn contributors, I am happy to announce the 0.11 beta. We are doing a quick beta and will hopefuly be releasing the final version tomorrow. The purpose of this beta is to get feedback on any release-critical bugs such as build issues. You can download the zip files of the beta on: https://github.com/scikit-learn/scikit-learn/zipball/0.11-beta You can also retrieve the latest code on https://github.com/scikit-learn/scikit-learn/zipball/master or using 'git clone git at github.com:scikit-learn/scikit-learn.git' Any feedback is more than welcome, Cheers, Ga?l From ceball at gmail.com Sun May 6 15:08:36 2012 From: ceball at gmail.com (Chris Ball) Date: Sun, 6 May 2012 19:08:36 +0000 (UTC) Subject: [Numpy-discussion] How to run NumPy's tests with coverage? Message-ID: Hi, I'm trying to figure out how to run NumPy's tests with coverage enabled (i.e. numpy.test(coverage=True) ). I can run the tests successfully like this: $ git clone git://github.com/numpy/numpy.git [...] $ cd numpy/ $ python setup.py build_ext -i [...] $ cd .. # (avoid running from source directory) $ export PYTHONPATH=numpy/ $ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. python> import numpy python> numpy.test() Running unit tests for numpy NumPy version 1.7.0.dev-259fff8 NumPy is installed in [...]/numpy Python version 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] nose version 0.11.1 [...] Ran 3710 tests in 27.654s OK (KNOWNFAIL=3, SKIP=6) However, if I try to run the tests with coverage, I get lots of errors (and seven more tests are run than without coverage): python> numpy.test(coverage=True) Running unit tests for numpy NumPy version 1.7.0.dev-259fff8 NumPy is installed in [...]/numpy Python version 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] nose version 0.11.1 Could not locate executable icc Could not locate executable ecc [...]/numpy/numarray/alter_code2.py:12: UserWarning: numpy.numarray.alter_code2 is not working yet. warnings.warn("numpy.numarray.alter_code2 is not working yet.") [...]/numpy/oldnumeric/alter_code2.py:26: UserWarning: numpy.oldnumeric.alter_code2 is not working yet. warnings.warn("numpy.oldnumeric.alter_code2 is not working yet.") [...] ====================================================================== ERROR: Failure: ImportError (No module named waflib.Configure) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "[...]/numpy/build_utils/waf.py", line 4, in import waflib.Configure ImportError: No module named waflib.Configure ====================================================================== ERROR: Failure: ImportError (No module named numscons.numdist) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "[...]/numpy/core/scons_support.py", line 21, in from numscons.numdist import process_c_str as process_str ImportError: No module named numscons.numdist ====================================================================== ERROR: Failure: ImportError (No module named numscons) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "[...]/numpy/core/setupscons.py", line 8, in from numscons import get_scons_build_dir ImportError: No module named numscons ====================================================================== ERROR: test_multiarray.TestNewBufferProtocol.test_roundtrip ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/case.py", line 183, in runTest self.test(*self.arg) File "[...]/numpy/core/tests/test_multiarray.py", line 2233, in test_roundtrip assert_raises(ValueError, self._check_roundtrip, x) File "[...]/numpy/testing/utils.py", line 1053, in assert_raises return nose.tools.assert_raises(*args,**kwargs) File "/usr/lib/python2.6/unittest.py", line 336, in failUnlessRaises callableObj(*args, **kwargs) File "[...]/numpy/core/tests/test_multiarray.py", line 2167, in _check_roundtrip y = np.asarray(x) File "[...]/numpy/core/tests/test_multiarray.py", line 2167, in _check_roundtrip y = np.asarray(x) File "/usr/lib/python2.6/dist-packages/coverage.py", line 322, in t self.c[(f.f_code.co_filename, f.f_lineno)] = 1 RuntimeWarning: tp_compare didn't return -1 or -2 for exception ====================================================================== ERROR: Failure: ImportError (No module named np.core.fromnumeric) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "[...]/numpy/ma/timer_comparison.py", line 6, in import np.core.fromnumeric as fromnumeric ImportError: No module named np.core.fromnumeric ====================================================================== ERROR: Failure: AttributeError ('module' object has no attribute '__revision__') ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "[...]/numpy/ma/version.py", line 9, in revision = [core.__revision__.split(':')[-1][:-1].strip(), AttributeError: 'module' object has no attribute '__revision__' ====================================================================== ERROR: Failure: ImportError (The convolve package is not installed. It can be downloaded by checking out the latest source from http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and installing all of SciPy from http://www.scipy.org. ) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "[...]/numpy/numarray/convolve.py", line 14, in raise ImportError(msg) ImportError: The convolve package is not installed. It can be downloaded by checking out the latest source from http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and installing all of SciPy from http://www.scipy.org. ====================================================================== ERROR: Failure: ImportError (The image package is not installed It can be downloaded by checking out the latest source from http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and installing all of SciPy from http://www.scipy.org. ) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "[...]/numpy/numarray/image.py", line 14, in raise ImportError(msg) ImportError: The image package is not installed It can be downloaded by checking out the latest source from http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and installing all of SciPy from http://www.scipy.org. ====================================================================== FAIL: test_blasdot.test_dot_3args ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/pymodules/python2.6/nose/case.py", line 183, in runTest self.test(*self.arg) File "[...]/numpy/core/tests/test_blasdot.py", line 52, in test_dot_3args assert_equal(sys.getrefcount(r), 2) File "[...]/numpy/testing/utils.py", line 313, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: 3 DESIRED: 2 ====================================================================== FAIL: test_dot_3args (test_multiarray.TestDot) ---------------------------------------------------------------------- Traceback (most recent call last): File "[...]/numpy/core/tests/test_multiarray.py", line 1698, in test_dot_3args assert_equal(sys.getrefcount(r), 2) File "[...]/numpy/testing/utils.py", line 313, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: 3 DESIRED: 2 [...] Ran 3717 tests in 31.447s FAILED (KNOWNFAIL=3, SKIP=6, errors=8, failures=2) Anyone know what I'm doing wrong? I'm using Ubuntu 10.04 LTS in case that matters. Thanks, Chris From aldcroft at head.cfa.harvard.edu Sun May 6 15:35:45 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Sun, 6 May 2012 15:35:45 -0400 Subject: [Numpy-discussion] numpy_quaternion and gcc 4.1.2 Message-ID: I ran into a problem trying to build and import the numpy_quaternion extension on CentOS-5 x86_64: $ python setup.py build C compiler: gcc -pthread -fno-strict-aliasing -fPIC -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-I/data/cosmos2/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/numpy/core/include -I/data/cosmos2/ska/arch/x86_64-linux_CentOS-5/include/python2.7 -c' gcc: quaternion.c quaternion.c: In function \u2018quaternion_isfinite\u2019: quaternion.c:55: warning: implicit declaration of function \u2018isfinite\u2019 gcc: numpy_quaternion.c gcc -pthread -shared build/temp.linux-x86_64-2.7/quaternion.o build/temp.linux-x86_64-2.7/numpy_quaternion.o -o build/lib.linux-x86_64-2.7/quaternion/numpy_quaternion.so running scons There was a subsequent import error with "numpy_quaternion.so: undefined symbol: isfinite". This problem does not occur for Ubuntu 11.10 and I presume it is due to CentOS-5 gcc (4.1.2) defaulting to -c89. I fixed this in setup.py by adding "extra_compile_args['-std=c99']" to the add_extension() call. Is there a more general way in numpy to deal with issues like this? Thanks, Tom From ralf.gommers at googlemail.com Sun May 6 16:39:05 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 6 May 2012 22:39:05 +0200 Subject: [Numpy-discussion] How to run NumPy's tests with coverage? In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 9:08 PM, Chris Ball wrote: > Hi, > > I'm trying to figure out how to run NumPy's tests with coverage enabled > (i.e. > numpy.test(coverage=True) ). I can run the tests successfully like this: > This seems to have been broken somewhere along the way. If you remove the argument "--cover-inclusive" from line 242 in numpy/testing/nosetester.py, that should fix all errors except TestNewBufferProtocol.test_roundtrip. Not sure what's going on with that one. Ralf > $ git clone git://github.com/numpy/numpy.git > [...] > $ cd numpy/ > $ python setup.py build_ext -i > [...] > $ cd .. # (avoid running from source directory) > $ export PYTHONPATH=numpy/ > $ python > Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) > [GCC 4.4.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > python> import numpy > python> numpy.test() > Running unit tests for numpy > NumPy version 1.7.0.dev-259fff8 > NumPy is installed in [...]/numpy > Python version 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] > nose version 0.11.1 > [...] > Ran 3710 tests in 27.654s > > OK (KNOWNFAIL=3, SKIP=6) > > > > However, if I try to run the tests with coverage, I get lots of errors (and > seven more tests are run than without coverage): > > python> numpy.test(coverage=True) > Running unit tests for numpy > NumPy version 1.7.0.dev-259fff8 > NumPy is installed in [...]/numpy > Python version 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] > nose version 0.11.1 > Could not locate executable icc > Could not locate executable ecc > [...]/numpy/numarray/alter_code2.py:12: UserWarning: > numpy.numarray.alter_code2 > is not working yet. > warnings.warn("numpy.numarray.alter_code2 is not working yet.") > [...]/numpy/oldnumeric/alter_code2.py:26: UserWarning: > numpy.oldnumeric.alter_code2 is not working yet. > warnings.warn("numpy.oldnumeric.alter_code2 is not working yet.") > [...] > ====================================================================== > ERROR: Failure: ImportError (No module named waflib.Configure) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in > importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in > importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "[...]/numpy/build_utils/waf.py", line 4, in > import waflib.Configure > ImportError: No module named waflib.Configure > > ====================================================================== > ERROR: Failure: ImportError (No module named numscons.numdist) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in > importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in > importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "[...]/numpy/core/scons_support.py", line 21, in > from numscons.numdist import process_c_str as process_str > ImportError: No module named numscons.numdist > > ====================================================================== > ERROR: Failure: ImportError (No module named numscons) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in > importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in > importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "[...]/numpy/core/setupscons.py", line 8, in > from numscons import get_scons_build_dir > ImportError: No module named numscons > > ====================================================================== > ERROR: test_multiarray.TestNewBufferProtocol.test_roundtrip > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/case.py", line 183, in runTest > self.test(*self.arg) > File "[...]/numpy/core/tests/test_multiarray.py", line 2233, in > test_roundtrip > assert_raises(ValueError, self._check_roundtrip, x) > File "[...]/numpy/testing/utils.py", line 1053, in assert_raises > return nose.tools.assert_raises(*args,**kwargs) > File "/usr/lib/python2.6/unittest.py", line 336, in failUnlessRaises > callableObj(*args, **kwargs) > File "[...]/numpy/core/tests/test_multiarray.py", line 2167, in > _check_roundtrip > y = np.asarray(x) > File "[...]/numpy/core/tests/test_multiarray.py", line 2167, in > _check_roundtrip > y = np.asarray(x) > File "/usr/lib/python2.6/dist-packages/coverage.py", line 322, in t > self.c[(f.f_code.co_filename, f.f_lineno)] = 1 > RuntimeWarning: tp_compare didn't return -1 or -2 for exception > > ====================================================================== > ERROR: Failure: ImportError (No module named np.core.fromnumeric) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in > importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in > importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "[...]/numpy/ma/timer_comparison.py", line 6, in > import np.core.fromnumeric as fromnumeric > ImportError: No module named np.core.fromnumeric > > ====================================================================== > ERROR: Failure: AttributeError ('module' object has no attribute > '__revision__') > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in > importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in > importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "[...]/numpy/ma/version.py", line 9, in > revision = [core.__revision__.split(':')[-1][:-1].strip(), > AttributeError: 'module' object has no attribute '__revision__' > > ====================================================================== > ERROR: Failure: ImportError (The convolve package is not installed. > > It can be downloaded by checking out the latest source from > http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and > installing all of SciPy from http://www.scipy.org. > ) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in > importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in > importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "[...]/numpy/numarray/convolve.py", line 14, in > raise ImportError(msg) > ImportError: The convolve package is not installed. > > It can be downloaded by checking out the latest source from > http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and > installing all of SciPy from http://www.scipy.org. > > > ====================================================================== > ERROR: Failure: ImportError (The image package is not installed > > It can be downloaded by checking out the latest source from > http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and > installing all of SciPy from http://www.scipy.org. > ) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in > loadTestsFromName > addr.filename, addr.module) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in > importFromPath > return self.importFromDir(dir_path, fqname) > File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in > importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File "[...]/numpy/numarray/image.py", line 14, in > raise ImportError(msg) > ImportError: The image package is not installed > > It can be downloaded by checking out the latest source from > http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and > installing all of SciPy from http://www.scipy.org. > > > ====================================================================== > FAIL: test_blasdot.test_dot_3args > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/pymodules/python2.6/nose/case.py", line 183, in runTest > self.test(*self.arg) > File "[...]/numpy/core/tests/test_blasdot.py", line 52, in test_dot_3args > assert_equal(sys.getrefcount(r), 2) > File "[...]/numpy/testing/utils.py", line 313, in assert_equal > raise AssertionError(msg) > AssertionError: > Items are not equal: > ACTUAL: 3 > DESIRED: 2 > > ====================================================================== > FAIL: test_dot_3args (test_multiarray.TestDot) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "[...]/numpy/core/tests/test_multiarray.py", line 1698, in > test_dot_3args > assert_equal(sys.getrefcount(r), 2) > File "[...]/numpy/testing/utils.py", line 313, in assert_equal > raise AssertionError(msg) > AssertionError: > Items are not equal: > ACTUAL: 3 > DESIRED: 2 > > [...] > > Ran 3717 tests in 31.447s > > FAILED (KNOWNFAIL=3, SKIP=6, errors=8, failures=2) > > > > Anyone know what I'm doing wrong? > > I'm using Ubuntu 10.04 LTS in case that matters. > > Thanks, > Chris > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun May 6 18:30:47 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 May 2012 16:30:47 -0600 Subject: [Numpy-discussion] numpy_quaternion and gcc 4.1.2 In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 1:35 PM, Tom Aldcroft wrote: > I ran into a problem trying to build and import the numpy_quaternion > extension on CentOS-5 x86_64: > > $ python setup.py build > > C compiler: gcc -pthread -fno-strict-aliasing -fPIC -g -O2 -DNDEBUG -g > -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC > > compile options: > > '-I/data/cosmos2/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/numpy/core/include > -I/data/cosmos2/ska/arch/x86_64-linux_CentOS-5/include/python2.7 -c' > gcc: quaternion.c > quaternion.c: In function \u2018quaternion_isfinite\u2019: > quaternion.c:55: warning: implicit declaration of function > \u2018isfinite\u2019 > gcc: numpy_quaternion.c > gcc -pthread -shared build/temp.linux-x86_64-2.7/quaternion.o > build/temp.linux-x86_64-2.7/numpy_quaternion.o -o > build/lib.linux-x86_64-2.7/quaternion/numpy_quaternion.so > running scons > > There was a subsequent import error with "numpy_quaternion.so: > undefined symbol: isfinite". This problem does not occur for Ubuntu > 11.10 and I presume it is due to CentOS-5 gcc (4.1.2) defaulting to > -c89. > > I fixed this in setup.py by adding "extra_compile_args['-std=c99']" to > the add_extension() call. Is there a more general way in numpy to > deal with issues like this? > > You might take a look at core/include/numpy/npy_math.h, which I suspect goes with core/lib/libnpymath.a. Running nm on the latter, it looks like there are some extra symbols exported, but that is a bit to the side. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 6 23:25:48 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 6 May 2012 23:25:48 -0400 Subject: [Numpy-discussion] How to run NumPy's tests with coverage? In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 4:39 PM, Ralf Gommers wrote: > > > On Sun, May 6, 2012 at 9:08 PM, Chris Ball wrote: >> >> Hi, >> >> I'm trying to figure out how to run NumPy's tests with coverage enabled >> (i.e. >> numpy.test(coverage=True) ). I can run the tests successfully like this: > > > This seems to have been broken somewhere along the way. If you remove the > argument "--cover-inclusive" from line 242 in numpy/testing/nosetester.py, > that should fix all errors except TestNewBufferProtocol.test_roundtrip. Not > sure what's going on with that one. removing "--cover-inclusive" helped me also with statsmodels, with it it ran all example scripts and got stuck several times, (permanently stuck in some multiprocessing example?) Now coverage=True worked for the first time. Is it possible to make this optional or remove it from numpy? from a beneficiary of the nice numpy testing support outside of numpy Thanks for the tip, Josef > > Ralf > > > >> >> $ git clone git://github.com/numpy/numpy.git >> [...] >> $ cd numpy/ >> $ python setup.py build_ext -i >> [...] >> $ cd .. ?# (avoid running from source directory) >> $ export PYTHONPATH=numpy/ >> $ python >> Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) >> [GCC 4.4.3] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> python> import numpy >> python> numpy.test() >> Running unit tests for numpy >> NumPy version 1.7.0.dev-259fff8 >> NumPy is installed in [...]/numpy >> Python version 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] >> nose version 0.11.1 >> [...] >> Ran 3710 tests in 27.654s >> >> OK (KNOWNFAIL=3, SKIP=6) >> >> >> >> However, if I try to run the tests with coverage, I get lots of errors >> (and >> seven more tests are run than without coverage): >> >> python> numpy.test(coverage=True) >> Running unit tests for numpy >> NumPy version 1.7.0.dev-259fff8 >> NumPy is installed in [...]/numpy >> Python version 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] >> nose version 0.11.1 >> Could not locate executable icc >> Could not locate executable ecc >> [...]/numpy/numarray/alter_code2.py:12: UserWarning: >> numpy.numarray.alter_code2 >> is not working yet. >> ?warnings.warn("numpy.numarray.alter_code2 is not working yet.") >> [...]/numpy/oldnumeric/alter_code2.py:26: UserWarning: >> numpy.oldnumeric.alter_code2 is not working yet. >> ?warnings.warn("numpy.oldnumeric.alter_code2 is not working yet.") >> [...] >> ====================================================================== >> ERROR: Failure: ImportError (No module named waflib.Configure) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in >> loadTestsFromName >> ? ?addr.filename, addr.module) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in >> importFromPath >> ? ?return self.importFromDir(dir_path, fqname) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in >> importFromDir >> ? ?mod = load_module(part_fqname, fh, filename, desc) >> ?File "[...]/numpy/build_utils/waf.py", line 4, in >> ? ?import waflib.Configure >> ImportError: No module named waflib.Configure >> >> ====================================================================== >> ERROR: Failure: ImportError (No module named numscons.numdist) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in >> loadTestsFromName >> ? ?addr.filename, addr.module) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in >> importFromPath >> ? ?return self.importFromDir(dir_path, fqname) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in >> importFromDir >> ? ?mod = load_module(part_fqname, fh, filename, desc) >> ?File "[...]/numpy/core/scons_support.py", line 21, in >> ? ?from numscons.numdist import process_c_str as process_str >> ImportError: No module named numscons.numdist >> >> ====================================================================== >> ERROR: Failure: ImportError (No module named numscons) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in >> loadTestsFromName >> ? ?addr.filename, addr.module) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in >> importFromPath >> ? ?return self.importFromDir(dir_path, fqname) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in >> importFromDir >> ? ?mod = load_module(part_fqname, fh, filename, desc) >> ?File "[...]/numpy/core/setupscons.py", line 8, in >> ? ?from numscons import get_scons_build_dir >> ImportError: No module named numscons >> >> ====================================================================== >> ERROR: test_multiarray.TestNewBufferProtocol.test_roundtrip >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/case.py", line 183, in runTest >> ? ?self.test(*self.arg) >> ?File "[...]/numpy/core/tests/test_multiarray.py", line 2233, in >> test_roundtrip >> ? ?assert_raises(ValueError, self._check_roundtrip, x) >> ?File "[...]/numpy/testing/utils.py", line 1053, in assert_raises >> ? ?return nose.tools.assert_raises(*args,**kwargs) >> ?File "/usr/lib/python2.6/unittest.py", line 336, in failUnlessRaises >> ? ?callableObj(*args, **kwargs) >> ?File "[...]/numpy/core/tests/test_multiarray.py", line 2167, in >> _check_roundtrip >> ? ?y = np.asarray(x) >> ?File "[...]/numpy/core/tests/test_multiarray.py", line 2167, in >> _check_roundtrip >> ? ?y = np.asarray(x) >> ?File "/usr/lib/python2.6/dist-packages/coverage.py", line 322, in t >> ? ?self.c[(f.f_code.co_filename, f.f_lineno)] = 1 >> RuntimeWarning: tp_compare didn't return -1 or -2 for exception >> >> ====================================================================== >> ERROR: Failure: ImportError (No module named np.core.fromnumeric) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in >> loadTestsFromName >> ? ?addr.filename, addr.module) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in >> importFromPath >> ? ?return self.importFromDir(dir_path, fqname) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in >> importFromDir >> ? ?mod = load_module(part_fqname, fh, filename, desc) >> ?File "[...]/numpy/ma/timer_comparison.py", line 6, in >> ? ?import np.core.fromnumeric as fromnumeric >> ImportError: No module named np.core.fromnumeric >> >> ====================================================================== >> ERROR: Failure: AttributeError ('module' object has no attribute >> '__revision__') >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in >> loadTestsFromName >> ? ?addr.filename, addr.module) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in >> importFromPath >> ? ?return self.importFromDir(dir_path, fqname) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in >> importFromDir >> ? ?mod = load_module(part_fqname, fh, filename, desc) >> ?File "[...]/numpy/ma/version.py", line 9, in >> ? ?revision = [core.__revision__.split(':')[-1][:-1].strip(), >> AttributeError: 'module' object has no attribute '__revision__' >> >> ====================================================================== >> ERROR: Failure: ImportError (The convolve package is not installed. >> >> It can be downloaded by checking out the latest source from >> http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and >> installing all of SciPy from http://www.scipy.org. >> ) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in >> loadTestsFromName >> ? ?addr.filename, addr.module) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in >> importFromPath >> ? ?return self.importFromDir(dir_path, fqname) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in >> importFromDir >> ? ?mod = load_module(part_fqname, fh, filename, desc) >> ?File "[...]/numpy/numarray/convolve.py", line 14, in >> ? ?raise ImportError(msg) >> ImportError: The convolve package is not installed. >> >> It can be downloaded by checking out the latest source from >> http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and >> installing all of SciPy from http://www.scipy.org. >> >> >> ====================================================================== >> ERROR: Failure: ImportError (The image package is not installed >> >> It can be downloaded by checking out the latest source from >> http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and >> installing all of SciPy from http://www.scipy.org. >> ) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/loader.py", line 379, in >> loadTestsFromName >> ? ?addr.filename, addr.module) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 39, in >> importFromPath >> ? ?return self.importFromDir(dir_path, fqname) >> ?File "/usr/lib/pymodules/python2.6/nose/importer.py", line 86, in >> importFromDir >> ? ?mod = load_module(part_fqname, fh, filename, desc) >> ?File "[...]/numpy/numarray/image.py", line 14, in >> ? ?raise ImportError(msg) >> ImportError: The image package is not installed >> >> It can be downloaded by checking out the latest source from >> http://svn.scipy.org/svn/scipy/trunk/Lib/stsci or by downloading and >> installing all of SciPy from http://www.scipy.org. >> >> >> ====================================================================== >> FAIL: test_blasdot.test_dot_3args >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "/usr/lib/pymodules/python2.6/nose/case.py", line 183, in runTest >> ? ?self.test(*self.arg) >> ?File "[...]/numpy/core/tests/test_blasdot.py", line 52, in test_dot_3args >> ? ?assert_equal(sys.getrefcount(r), 2) >> ?File "[...]/numpy/testing/utils.py", line 313, in assert_equal >> ? ?raise AssertionError(msg) >> AssertionError: >> Items are not equal: >> ?ACTUAL: 3 >> ?DESIRED: 2 >> >> ====================================================================== >> FAIL: test_dot_3args (test_multiarray.TestDot) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "[...]/numpy/core/tests/test_multiarray.py", line 1698, in >> test_dot_3args >> ? ?assert_equal(sys.getrefcount(r), 2) >> ?File "[...]/numpy/testing/utils.py", line 313, in assert_equal >> ? ?raise AssertionError(msg) >> AssertionError: >> Items are not equal: >> ?ACTUAL: 3 >> ?DESIRED: 2 >> >> [...] >> >> Ran 3717 tests in 31.447s >> >> FAILED (KNOWNFAIL=3, SKIP=6, errors=8, failures=2) >> >> >> >> Anyone know what I'm doing wrong? >> >> I'm using Ubuntu 10.04 LTS in case that matters. >> >> Thanks, >> Chris >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From travis at continuum.io Mon May 7 01:11:30 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 7 May 2012 00:11:30 -0500 Subject: [Numpy-discussion] Quaternion data type In-Reply-To: References: Message-ID: <16176926-0373-44FA-B928-B5797D19E5A6@continuum.io> On May 6, 2012, at 12:16 PM, Charles R Harris wrote: > > > On Sun, May 6, 2012 at 6:02 AM, Tom Aldcroft wrote: > On Sun, May 6, 2012 at 3:56 AM, David Cournapeau wrote: > > > > > > On Sat, May 5, 2012 at 9:43 PM, Mark Wiebe wrote: > >> > >> On Sat, May 5, 2012 at 1:06 PM, Charles R Harris > >> wrote: > >>> > >>> On Sat, May 5, 2012 at 11:19 AM, Mark Wiebe wrote: > >>>> > >>>> On Sat, May 5, 2012 at 11:55 AM, Charles R Harris > >>>> wrote: > >>>>> > >>>>> On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft > >>>>> wrote: > >>>>>> > >>>>>> On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell > >>>>>> wrote: > >>>>>> > Hi Chuck, > >>>>>> > > >>>>>> > thanks for the prompt reply. I as curious because because > >>>>>> > someone was interested in adding > >>>>>> > http://pypi.python.org/pypi/Quaternion > >>>>>> > to EPD, but Martin and Mark's implementation of quaternions > >>>>>> > looks much better. > >>>>>> > >>>>>> Hi - > >>>>>> > >>>>>> I'm a co-author of the above mentioned Quaternion package. I agree > >>>>>> the numpy_quaternion version would be better, but if there is no > >>>>>> expectation that it will move forward I can offer to improve our > >>>>>> Quaternion. A few months ago I played around with making it accept > >>>>>> arbitrary array inputs (with similar shape of course) to essentially > >>>>>> vectorize the transformations. We never got around to putting this in > >>>>>> a release because of a perceived lack of interest / priorities... If > >>>>>> this would be useful then let me know. > >>>>>> > >>>>> > >>>>> Would you be interested in carrying Martin's package forward? I'm not > >>>>> opposed to having quaternions in numpy/scipy but there needs to be someone > >>>>> to push it and deal with problems if they come up. Martin's package > >>>>> disappeared in large part because Martin disappeared. I'd also like to hear > >>>>> from Mark about other aspects, as there was also a simple rational user type > >>>>> proposed that we were looking to put in as an extension 'test' type. IIRC, > >>>>> there were some needed fixes to Numpy, some of which were postponed in favor > >>>>> of larger changes. User types is one of the things we want ot get fixed up. > >>>> > >>>> > >>>> I kind of like the idea of there being a package, separate from numpy, > >>>> which collects these dtypes together. To start, the quaternion and the > >>>> rational type could go in it, and eventually I think it would be nice to > >>>> move datetime64 there as well. Maybe it could be called numpy-dtypes, or > >>>> would a more creative name be better? > >>> > >>> > >>> I'm trying to think about how that would be organized. We could create a > >>> new repository, numpy-user-types (numpy-extension-types), under the numpy > >>> umbrella. It would need documents and such as well as someone interested in > >>> maintaining it and making releases. A branch in the numpy repository > >>> wouldn't work since we would want to rebase it regularly. It could maybe go > >>> in scipy but a new package would need to be created there and it feels too > >>> distant from numpy for such basic types as datetime. > >>> > >>> Do you have thoughts about the details? > >> > >> > >> Another repository under the numpy umbrella would best fit what I'm > >> imagining, yes. I would imagine it as a package of additional types that > >> aren't the core ones, but that many people would probably want to install. > >> It would also be a way to continually exercise the type extension system, to > >> make sure it doesn't break. It couldn't be a branch of numpy, rather a > >> collection of additional dtypes and associated useful functions. > > > > > > I would be in favor of this as well. We could start the repository by having > > one "trivial" dtype that would serve as an example. That's something I have > > been interested in, I can lock a couple of hours / week to help this with. > > > > How about if I start by working on adding tests within > numpy_quaternion, then this can be migrated into an extended dtypes > package when it is set up. > > Sounds like a good start. You might want to ping Martin too. > > > A nice "trivial" dtype example would be very useful, as I mentioned > just last week our group was wondering how to make a new dtype. > > > There is the rational dtype. I expect there will be some interaction between numpy and the extension types as the bugs are worked out. Extension types for numpy haven't been much used. Actually, they have been used fairly extensively in multiple projects that I am aware of. They have just not been discussed enough, nor is there a good open-source collection of extension dtypes. It is also harder than it really should be to create extension dtypes. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail.till at gmx.de Mon May 7 05:37:33 2012 From: mail.till at gmx.de (Till Stensitzki) Date: Mon, 7 May 2012 09:37:33 +0000 (UTC) Subject: [Numpy-discussion] Extension types repository References: Message-ID: Charles R Harris gmail.com> writes: > Make Tom a member of the numpy organization on github. > Set up an extension dtypes repository in github.com/numpy > > > Other proposals for the name are welcome. > Why not put them into scipy.dtypes? Till From ndbecker2 at gmail.com Mon May 7 08:58:14 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 07 May 2012 08:58:14 -0400 Subject: [Numpy-discussion] Quaternion data type References: <16176926-0373-44FA-B928-B5797D19E5A6@continuum.io> Message-ID: I am quite interested in a fixed point data type. I had produced a working model some time ago. Maybe I can use some of these new efforts to provide good examples as a guide. From aldcroft at head.cfa.harvard.edu Mon May 7 09:28:08 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Mon, 7 May 2012 09:28:08 -0400 Subject: [Numpy-discussion] numpy_quaternion: OK on numpy 1.5.0, fails on 1.6.1 Message-ID: Sorry to bother again, but I am running into an issue with the numpy quaternion dtype on numpy 1.6.1 : $ python ActivePython 2.7.1.4 (ActiveState Software Inc.) based on Python 2.7.1 (r271:86832, Feb 7 2011, 11:30:54) [GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> import quaternion >>> q = numpy.quaternion(1,0,0,0) >>> q quaternion(1, 0, 0, 0) >>> q2 = numpy.quaternion(1,0,0,0) >>> q * q2 Traceback (most recent call last): File "", line 1, in TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe' >>> numpy.__version__ '1.6.1' Using numpy 1.5.0 on the same platform the multiplication works and I get the expected result. For my near-term goal of getting tests in place I can just use 1.5.0, but if anyone has an idea of the problem I would appreciate help. Thanks, Tom From charlesr.harris at gmail.com Mon May 7 09:57:33 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 May 2012 07:57:33 -0600 Subject: [Numpy-discussion] Extension types repository In-Reply-To: References: Message-ID: On Mon, May 7, 2012 at 3:37 AM, Till Stensitzki wrote: > Charles R Harris gmail.com> writes: > > > > Make Tom a member of the numpy organization on github. > > Set up an extension dtypes repository in github.com/numpy > > > > > > Other proposals for the name are welcome. > > > > Why not put them into scipy.dtypes? > > It was a possibility, but I thought it too distant from numpy. I think it will be good for the new repository to have its own release schedule also. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon May 7 09:59:01 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 May 2012 07:59:01 -0600 Subject: [Numpy-discussion] Extension types repository In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 10:05 AM, Charles R Harris wrote: > > > On Sun, May 6, 2012 at 2:38 AM, Ralf Gommers wrote: > >> >> >> On Sun, May 6, 2012 at 5:44 AM, Travis Oliphant wrote: >> >>> +1 >>> >>> Travis >>> >>> -- >>> Travis Oliphant >>> (on a mobile) >>> 512-826-7480 >>> >>> >>> On May 5, 2012, at 10:19 PM, Charles R Harris >>> wrote: >>> >>> All, >>> >>> Tom Aldcroft volunteered to bring quaternions into numpy. The proposal >>> is to set up a separate repository under the numpy name on github, >>> npydtypes or some such, and bring in Martin Ling's quaternion extension >>> dtype as a start. Other extension types that would reside in the repository >>> would be the simple rational type, and perhaps some specialized >>> astronomical time types. So here is the proposal. >>> >>> +1 >> >>> >>> 1. Make Tom a member of the numpy organization on github. >>> >>> Would need a new team to be set up too. Travis is the only one who can >> do that. >> >> > Yes, looks like Travis needs to create the new repository and add at least > one core team member, who can then add others. I'd suggest > numpy-extension-dtypes for the repository name, Tom is on github as > taldcroft. > > Travis, it might be a good idea to add one more person with ownership > permissions as a backup if that is possible. > > Looks like I can do this. I'll put up the repository. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon May 7 10:25:58 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 May 2012 08:25:58 -0600 Subject: [Numpy-discussion] Extension types repository In-Reply-To: References: Message-ID: On Mon, May 7, 2012 at 7:59 AM, Charles R Harris wrote: > > > On Sun, May 6, 2012 at 10:05 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sun, May 6, 2012 at 2:38 AM, Ralf Gommers > > wrote: >> >>> >>> >>> On Sun, May 6, 2012 at 5:44 AM, Travis Oliphant wrote: >>> >>>> +1 >>>> >>>> Travis >>>> >>>> -- >>>> Travis Oliphant >>>> (on a mobile) >>>> 512-826-7480 >>>> >>>> >>>> On May 5, 2012, at 10:19 PM, Charles R Harris < >>>> charlesr.harris at gmail.com> wrote: >>>> >>>> All, >>>> >>>> Tom Aldcroft volunteered to bring quaternions into numpy. The proposal >>>> is to set up a separate repository under the numpy name on github, >>>> npydtypes or some such, and bring in Martin Ling's quaternion extension >>>> dtype as a start. Other extension types that would reside in the repository >>>> would be the simple rational type, and perhaps some specialized >>>> astronomical time types. So here is the proposal. >>>> >>>> +1 >>> >>>> >>>> 1. Make Tom a member of the numpy organization on github. >>>> >>>> Would need a new team to be set up too. Travis is the only one who can >>> do that. >>> >>> >> Yes, looks like Travis needs to create the new repository and add at >> least one core team member, who can then add others. I'd suggest >> numpy-extension-dtypes for the repository name, Tom is on github as >> taldcroft. >> >> Travis, it might be a good idea to add one more person with ownership >> permissions as a backup if that is possible. >> >> > Looks like I can do this. I'll put up the repository. > I (re)named the repository numpy-dtypes and added Tom to the core team. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon May 7 12:31:44 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 7 May 2012 18:31:44 +0200 Subject: [Numpy-discussion] How to run NumPy's tests with coverage? In-Reply-To: References: Message-ID: On Mon, May 7, 2012 at 5:25 AM, wrote: > On Sun, May 6, 2012 at 4:39 PM, Ralf Gommers > wrote: > > > > > > On Sun, May 6, 2012 at 9:08 PM, Chris Ball wrote: > >> > >> Hi, > >> > >> I'm trying to figure out how to run NumPy's tests with coverage enabled > >> (i.e. > >> numpy.test(coverage=True) ). I can run the tests successfully like this: > > > > > > This seems to have been broken somewhere along the way. If you remove the > > argument "--cover-inclusive" from line 242 in > numpy/testing/nosetester.py, > > that should fix all errors except TestNewBufferProtocol.test_roundtrip. > Not > > sure what's going on with that one. > > removing "--cover-inclusive" helped me also with statsmodels, with it > it ran all example scripts and got stuck several times, > (permanently stuck in some multiprocessing example?) > > Now coverage=True worked for the first time. > > Is it possible to make this optional or remove it from numpy? > This should be completely be removed I think, not worth making a new keyword for. Can't imagine this being useful to a lot of people, and if it is you can still use it from the command line with "$ nosetests --with-coverage --cover-inclusive". Ralf > from a beneficiary of the nice numpy testing support outside of numpy > > Thanks for the tip, > > Josef > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Mon May 7 13:37:10 2012 From: tim at cerazone.net (Tim Cera) Date: Mon, 7 May 2012 13:37:10 -0400 Subject: [Numpy-discussion] Documentation roles in the numpy/scipy documentation editor Message-ID: I think we should change the roles established for the Numpy/Scipy documentation editors because they do not work as intended. For reference they are described here: http://docs.scipy.org/numpy/Front%20Page/ Basically there aren't that many active people to support being split into the roles as described which has led to a backlog of 'Needs review' docstrings and only one 'Proofed' docstring. I think that many of these docstrings are good enough, just that not enough people have put themselves out front as so knowledgeable about a certain topic to label docstrings as 'Reviewed' or 'Proofed'. Here are the current statistics for numpy docstrings: Current%CountNeeds editing17279Being written / Changed462Needs review761235Needs review (revised)235Needs work (reviewed)03Reviewed (needs proof)00Proofed01 Unimportant?1793 I have thought about some solutions in no particular order: * Get rid of the 'Reviewer' and 'Proofer' roles. * Assign all 'Editors', the 'Reviewer', and 'Proofer' privileges. * People start out as 'Editors', and then become 'Reviewers', and 'Proofers' based on some editing metric. For full disclosure, I would be generous with a 'Reviewed' label if given the authority because philosophically I think there should be a point where the docstring is 'Good enough' and it should be expected to have a life of continually small improvements rather that a point when it is 'Done'. Regardless of what decision is made, the single 'Proofed' docstring should be available for editing. I can't even find what it is. I imagine that it should be on the docstring page at http://docs.scipy.org/numpy/docs/ Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon May 7 15:30:02 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 May 2012 13:30:02 -0600 Subject: [Numpy-discussion] numpy_quaternion: OK on numpy 1.5.0, fails on 1.6.1 In-Reply-To: References: Message-ID: On Mon, May 7, 2012 at 7:28 AM, Tom Aldcroft wrote: > Sorry to bother again, but I am running into an issue with the numpy > quaternion dtype on numpy 1.6.1 : > > $ python > ActivePython 2.7.1.4 (ActiveState Software Inc.) based on > Python 2.7.1 (r271:86832, Feb 7 2011, 11:30:54) > [GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy > >>> import quaternion > >>> q = numpy.quaternion(1,0,0,0) > >>> q > quaternion(1, 0, 0, 0) > >>> q2 = numpy.quaternion(1,0,0,0) > >>> q * q2 > Traceback (most recent call last): > File "", line 1, in > TypeError: ufunc 'multiply' not supported for the input types, and the > inputs could not be safely coerced to any supported types according to > the casting rule 'safe' > >>> numpy.__version__ > '1.6.1' > > Using numpy 1.5.0 on the same platform the multiplication works and I > get the expected result. > > For my near-term goal of getting tests in place I can just use 1.5.0, > but if anyone has an idea of the problem I would appreciate help. > > I haven't looked at this yet, but I set up the repository, added your name to the numpy core developers, and put in a pull request merging the rational stuff. If you would like, I can also merge in the quaternion type so you can start working with the repository. The build will need to be fixed up so it can work with quaternions in a subdirectory. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon May 7 16:14:56 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 7 May 2012 22:14:56 +0200 Subject: [Numpy-discussion] Documentation roles in the numpy/scipy documentation editor In-Reply-To: References: Message-ID: On Mon, May 7, 2012 at 7:37 PM, Tim Cera wrote: > I think we should change the roles established for the Numpy/Scipy > documentation editors because they do not work as intended. > > For reference they are described here: > http://docs.scipy.org/numpy/Front%20Page/ > > Basically there aren't that many active people to support being split into > the roles as described which has led to a backlog of 'Needs review' > docstrings and only one 'Proofed' docstring. I think that many of these > docstrings are good enough, just that not enough people have put themselves > out front as so knowledgeable about a certain topic to label docstrings as > 'Reviewed' or 'Proofed'. > > You're right. I think at some point the goal shifted from getting everything to "proofed" to getting everything to "needs review". > Here are the current statistics for numpy docstrings: > Current %Count Needs editing17 279 Being written / Changed4 62 Needs > review76 1235 Needs review (revised)2 35 Needs work (reviewed)0 3Reviewed (needs proof) > 0 0 Proofed0 1 Unimportant? 1793 > > The "needs editing" category actually contains mostly docstrings that are quite good, but were recently created and never edited in the doc wiki. The % keeps on growing. Bumping all polynomial docstrings up to "needs review" would be a good start here to make the % reflect the actual status. > > I have thought about some solutions in no particular order: > > * Get rid of the 'Reviewer' and 'Proofer' roles. > * Assign all 'Editors', the 'Reviewer', and 'Proofer' privileges. > * People start out as 'Editors', and then become 'Reviewers', and > 'Proofers' based on some editing metric. > > For full disclosure, I would be generous with a 'Reviewed' label if given > the authority because philosophically I think there should be a point where > the docstring is 'Good enough' and it should be expected to have a life of > continually small improvements rather that a point when it is 'Done'. > This makes sense to me. > Regardless of what decision is made, the single 'Proofed' docstring should > be available for editing. I can't even find what it is. I imagine that it > should be on the docstring page at http://docs.scipy.org/numpy/docs/ > > It used to be there - maybe the stats got confused. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon May 7 16:27:55 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 7 May 2012 15:27:55 -0500 Subject: [Numpy-discussion] numpy_quaternion: OK on numpy 1.5.0, fails on 1.6.1 In-Reply-To: References: Message-ID: <9F9ED53B-CF19-46D1-96C8-C23251D4838A@continuum.io> I've created a NumPy dtype package team and added several people to that team. If others would like to participate on these extension types, let me know. -Travis On May 7, 2012, at 8:28 AM, Tom Aldcroft wrote: > Sorry to bother again, but I am running into an issue with the numpy > quaternion dtype on numpy 1.6.1 : > > $ python > ActivePython 2.7.1.4 (ActiveState Software Inc.) based on > Python 2.7.1 (r271:86832, Feb 7 2011, 11:30:54) > [GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import numpy >>>> import quaternion >>>> q = numpy.quaternion(1,0,0,0) >>>> q > quaternion(1, 0, 0, 0) >>>> q2 = numpy.quaternion(1,0,0,0) >>>> q * q2 > Traceback (most recent call last): > File "", line 1, in > TypeError: ufunc 'multiply' not supported for the input types, and the > inputs could not be safely coerced to any supported types according to > the casting rule 'safe' >>>> numpy.__version__ > '1.6.1' > > Using numpy 1.5.0 on the same platform the multiplication works and I > get the expected result. > > For my near-term goal of getting tests in place I can just use 1.5.0, > but if anyone has an idea of the problem I would appreciate help. > > Thanks, > Tom > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From aldcroft at head.cfa.harvard.edu Mon May 7 16:33:27 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Mon, 7 May 2012 16:33:27 -0400 Subject: [Numpy-discussion] numpy_quaternion: OK on numpy 1.5.0, fails on 1.6.1 In-Reply-To: References: Message-ID: On Mon, May 7, 2012 at 3:30 PM, Charles R Harris wrote: > > > On Mon, May 7, 2012 at 7:28 AM, Tom Aldcroft > wrote: >> >> Sorry to bother again, but I am running into an issue with the numpy >> quaternion dtype on numpy 1.6.1 : >> >> $ python >> ActivePython 2.7.1.4 (ActiveState Software Inc.) based on >> Python 2.7.1 (r271:86832, Feb ?7 2011, 11:30:54) >> [GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> import numpy >> >>> import quaternion >> >>> q = numpy.quaternion(1,0,0,0) >> >>> q >> quaternion(1, 0, 0, 0) >> >>> q2 = numpy.quaternion(1,0,0,0) >> >>> q * q2 >> Traceback (most recent call last): >> ?File "", line 1, in >> TypeError: ufunc 'multiply' not supported for the input types, and the >> inputs could not be safely coerced to any supported types according to >> the casting rule 'safe' >> >>> numpy.__version__ >> '1.6.1' >> >> Using numpy 1.5.0 on the same platform the multiplication works and I >> get the expected result. >> >> For my near-term goal of getting tests in place I can just use 1.5.0, >> but if anyone has an idea of the problem I would appreciate help. >> > > I haven't looked at this yet, but I set up the repository, added your name > to the numpy core developers, and put in a pull request merging the rational > stuff. > > If you would like, I can also merge in the quaternion type so you can start > working with the repository. The build will need to be fixed up so it can > work with quaternions in a subdirectory. That would be good if you can merge in the quaternion type to get things rolling and set up in the right way. Thanks, Tom From charlesr.harris at gmail.com Mon May 7 16:41:16 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 May 2012 14:41:16 -0600 Subject: [Numpy-discussion] numpy_quaternion: OK on numpy 1.5.0, fails on 1.6.1 In-Reply-To: References: Message-ID: On Mon, May 7, 2012 at 2:33 PM, Tom Aldcroft wrote: > On Mon, May 7, 2012 at 3:30 PM, Charles R Harris > wrote: > > > > > > On Mon, May 7, 2012 at 7:28 AM, Tom Aldcroft < > aldcroft at head.cfa.harvard.edu> > > wrote: > >> > >> Sorry to bother again, but I am running into an issue with the numpy > >> quaternion dtype on numpy 1.6.1 : > >> > >> $ python > >> ActivePython 2.7.1.4 (ActiveState Software Inc.) based on > >> Python 2.7.1 (r271:86832, Feb 7 2011, 11:30:54) > >> [GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 > >> Type "help", "copyright", "credits" or "license" for more information. > >> >>> import numpy > >> >>> import quaternion > >> >>> q = numpy.quaternion(1,0,0,0) > >> >>> q > >> quaternion(1, 0, 0, 0) > >> >>> q2 = numpy.quaternion(1,0,0,0) > >> >>> q * q2 > >> Traceback (most recent call last): > >> File "", line 1, in > >> TypeError: ufunc 'multiply' not supported for the input types, and the > >> inputs could not be safely coerced to any supported types according to > >> the casting rule 'safe' > >> >>> numpy.__version__ > >> '1.6.1' > >> > >> Using numpy 1.5.0 on the same platform the multiplication works and I > >> get the expected result. > >> > >> For my near-term goal of getting tests in place I can just use 1.5.0, > >> but if anyone has an idea of the problem I would appreciate help. > >> > > > > I haven't looked at this yet, but I set up the repository, added your > name > > to the numpy core developers, and put in a pull request merging the > rational > > stuff. > > > > If you would like, I can also merge in the quaternion type so you can > start > > working with the repository. The build will need to be fixed up so it can > > work with quaternions in a subdirectory. > > That would be good if you can merge in the quaternion type to get > things rolling and set up in the right way. > OK. You should commit the first pull request just to get in the swing of things. I've got the quaternions ready to go but would like to rebase on top of the first commit. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pivanov314 at gmail.com Mon May 7 17:36:53 2012 From: pivanov314 at gmail.com (Paul Ivanov) Date: Mon, 7 May 2012 14:36:53 -0700 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: <54923DC2-AF39-4457-B894-FBD6CFAFB8A2@continuum.io> <4F9F552F.4060605@creativetrax.com> <5E41F0C1-9449-489C-8287-01C04060B02D@continuum.io> <21065CEC-A457-406D-96F9-BC68D523DA6A@continuum.io> Message-ID: +1 on migrating issues to GitHub, I'm so glad this discussion is happening. On Sat, May 5, 2012 at 8:28 PM, Charles R Harris wrote: > Uh oh. We are short on developers as is... Which brings up a question, do > people need a github account to open an issue? Creating an account on GH is currently required, but it's automated and self-evident how to do that. This little anecdote may say more about me than it does about the Trac instance, but I think it makes a point anyway: I came across a minor numpy or scipy issue last week while running the test suite, which was in a part of the project I don't use, and wanted to quickly report it, (or maybe just add some details to an existing ticket, I don't recall). I went to the GH page first, and saw that only Pull Requests were handled there, so I figured it must still be on the Trac. I went there to try to open a new ticket and then got something like this message: Notice: You are currently not logged in. You may want to do so now. Error: Forbidden TICKET_CREATE privileges are required to perform this operation TracGuide ? The Trac User and Administration Guide I tried to login and got an .htaccess login box in the browser - I tried to remember a username password combination that Jarrod set up for me back in 2008? each time I failed, I was then greeted with: Authorization Required This server could not verify that you are authorized to access the document requested. Either you supplied the wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required. Apache/2.2.3 (CentOS) Server at projects.scipy.org Port 80 Of course, taking a look now, I should have either been more diligent about finding the "forgot your password?" link [1] or just created a new username [2], but at the time it seemed like there was no concrete way to proceed forward. With that, the error didn't seem important enough and I decided to get back to the matter at hand - so I gave up. :\ So it really is nice to have everything in one place. When matplotlib had its tickets on SourceForge, I rarely ventured over there to check them, but now that they are on GitHub, everyone with the commit bit gets an email when a new issue is opened, and it makes it a lot easier to pitch in and participate. 1. http://projects.scipy.org/numpy/reset_password 2. http://projects.scipy.org/numpy/register best, -- Paul Ivanov 314 address only used for lists, off-list direct email at: http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Mon May 7 19:11:34 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 8 May 2012 01:11:34 +0200 Subject: [Numpy-discussion] Announce: scikit-learn v0.11 Message-ID: <20120507231134.GL19857@phare.normalesup.org> On behalf of Andy Mueller, our release manager, I am happy to announce the 0.11 release of scikit-learn. This release includes some major new features such as randomized sparse models, gradient boosted regression trees, label propagation and many more. The release also has major improvements in the documentation and in stability. Details can be found on the [1]what's new page. We also have a new page with [2]video tutorials on machine learning with scikit-learn and different aspects of the package. Sources and windows binaries are available on sourceforge, through pypi (http://pypi.python.org/pypi/scikit-learn/0.11) or can be installed directly using pip: pip install -U scikit-learn Thanks again to all the contributors who made this release possible. Cheers, Ga?l 1. http://scikit-learn.org/stable/whats_new.html 2. http://scikit-learn.org/stable/presentations.html From gael.varoquaux at normalesup.org Mon May 7 19:13:17 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 8 May 2012 01:13:17 +0200 Subject: [Numpy-discussion] [SciPy-Dev] Announce: scikit-learn v0.11 Message-ID: <20120507231317.GO19857@phare.normalesup.org> On behalf of Andy Mueller, our release manager, I am happy to announce the 0.11 release of scikit-learn. This release includes some major new features such as randomized sparse models, gradient boosted regression trees, label propagation and many more. The release also has major improvements in the documentation and in stability. Details can be found on the [1]what's new page. We also have a new page with [2]video tutorials on machine learning with scikit-learn and different aspects of the package. Sources and windows binaries are available on sourceforge, through pypi (http://pypi.python.org/pypi/scikit-learn/0.11) or can be installed directly using pip: pip install -U scikit-learn Thanks again to all the contributors who made this release possible. Cheers, Ga?l 1. http://scikit-learn.org/stable/whats_new.html 2. http://scikit-learn.org/stable/presentations.html From lists at hilboll.de Tue May 8 02:49:19 2012 From: lists at hilboll.de (Andreas H.) Date: Tue, 08 May 2012 08:49:19 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: <4FA8C1EF.4060708@hilboll.de> Am 05.05.2012 20:15, schrieb Ralf Gommers: > Hi, > > I'm pleased to announce the availability of the first release candidate > of NumPy 1.6.2. This is a maintenance release. Due to the delay of the > NumPy 1.7.0, this release contains far more fixes than a regular NumPy > bugfix release. It also includes a number of documentation and build > improvements. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > Please test this release and report any issues on the numpy-discussion > mailing list. > > Cheers, > Ralf > > > > ``numpy.core`` issues fixed > --------------------------- > > #2063 make unique() return consistent index > #1138 allow creating arrays from empty buffers or empty slices > #1446 correct note about correspondence vstack and concatenate > #1149 make argmin() work for datetime > #1672 fix allclose() to work for scalar inf > #1747 make np.median() work for 0-D arrays > #1776 make complex division by zero to yield inf properly > #1675 add scalar support for the format() function > #1905 explicitly check for NaNs in allclose() > #1952 allow floating ddof in std() and var() > #1948 fix regression for indexing chararrays with empty list > #2017 fix type hashing > #2046 deleting array attributes causes segfault > #2033 a**2.0 has incorrect type > #2045 make attribute/iterator_element deletions not segfault > #2021 fix segfault in searchsorted() > #2073 fix float16 __array_interface__ bug > > > ``numpy.lib`` issues fixed > -------------------------- > > #2048 break reference cycle in NpzFile > #1573 savetxt() now handles complex arrays > #1387 allow bincount() to accept empty arrays > #1899 fixed histogramdd() bug with empty inputs > #1793 fix failing npyio test under py3k > #1936 fix extra nesting for subarray dtypes > #1848 make tril/triu return the same dtype as the original array > #1918 use Py_TYPE to access ob_type, so it works also on Py3 > > > ``numpy.f2py`` changes > ---------------------- > > ENH: Introduce new options extra_f77_compiler_args and > extra_f90_compiler_args > BLD: Improve reporting of fcompiler value > BUG: Fix f2py test_kind.py test > > > ``numpy.poly`` changes > ---------------------- > > ENH: Add some tests for polynomial printing > ENH: Add companion matrix functions > DOC: Rearrange the polynomial documents > BUG: Fix up links to classes > DOC: Add version added to some of the polynomial package modules > DOC: Document xxxfit functions in the polynomial package modules > BUG: The polynomial convenience classes let different types interact > DOC: Document the use of the polynomial convenience classes > DOC: Improve numpy reference documentation of polynomial classes > ENH: Improve the computation of polynomials from roots > STY: Code cleanup in polynomial [*]fromroots functions > DOC: Remove references to cast and NA, which were added in 1.7 > > > ``numpy.distutils`` issues fixed > ------------------------------- > > #1261 change compile flag on AIX from -O5 to -O3 > #1377 update HP compiler flags > #1383 provide better support for C++ code on HPUX > #1857 fix build for py3k + pip > BLD: raise a clearer warning in case of building without cleaning up first > BLD: follow build_ext coding convention in build_clib > BLD: fix up detection of Intel CPU on OS X in system_info.py > BLD: add support for the new X11 directory structure on Ubuntu & co. > BLD: add ufsparse to the libraries search path. > BLD: add 'pgfortran' as a valid compiler in the Portland Group > BLD: update version match regexp for IBM AIX Fortran compilers. > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion 1.6.2RC1 builds fine under the following configurations (all of them x86_64): * Ubuntu Lucid 10.4 / Python 2.6.5 / GCC 4.4.3: OK (KNOWNFAIL=3, SKIP=5) * Archlinux (as of today) / Python 3.2.3 / GCC 4.7.0: OK (KNOWNFAIL=5, SKIP=5) * Archlinux (as of today) / Python 2.7.3 / GCC: OK (KNOWNFAIL=3, SKIP=5) Great work! Andreas. From derek at astro.physik.uni-goettingen.de Tue May 8 09:23:09 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 8 May 2012 15:23:09 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: <53503912-3DD7-4DF1-B679-8CCA385674EE@gmail.com> References: <53503912-3DD7-4DF1-B679-8CCA385674EE@gmail.com> Message-ID: On 06.05.2012, at 8:16AM, Paul Anton Letnes wrote: > All tests for 1.6.2rc1 pass on > Mac OS X 10.7.3 > python 2.7.2 > gcc 4.2 (Apple) Passing as well on 10.6 x86_64 and on 10.5.8 ppc with python 2.5.6/2.6.6/2.7.2 Apple gcc 4.0.1, but I am getting one failure on Lion (same with Python 2.5.6+2.6.7): Python version 2.7.3 (default, May 6 2012, 15:05:35) [GCC 4.2.1 Compatible Apple Clang 3.0 (tags/Apple/clang-211.12)] nose version 1.1.2 ====================================================================== FAIL: Test basic arithmetic function errors ---------------------------------------------------------------------- Traceback (most recent call last): File "/sw/lib/python2.7/site-packages/numpy/testing/decorators.py", line 215, in knownfailer return f(*args, **kwargs) File "/sw/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py", line 323, in test_floating_exceptions lambda a,b:a*b, ft_tiny, ft_tiny) File "/sw/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py", line 271, in assert_raises_fpe "Type %s did not raise fpe error '%s'." % (ftype, fpeerr)) File "/sw/lib/python2.7/site-packages/numpy/testing/utils.py", line 34, in assert_ raise AssertionError(msg) AssertionError: Type did not raise fpe error ''. ---------------------------------------------------------------------- Ran 3551 tests in 130.778s FAILED (KNOWNFAIL=3, SKIP=4, failures=1) Cheers, Derek From matrixhasu at gmail.com Wed May 9 12:36:02 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Wed, 9 May 2012 18:36:02 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers wrote: > Hi, > > I'm pleased to announce the availability of the first release candidate of > NumPy 1.6.2.? This is a maintenance release. Due to the delay of the NumPy > 1.7.0, this release contains far more fixes than a regular NumPy bugfix > release.? It also includes a number of documentation and build improvements. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > Please test this release and report any issues on the numpy-discussion > mailing list. Mh, I can't exactly understand this: $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1 2718 files changed, 390859 deletions(-) does it mean that the only thing the RC has done is to remove a lot of stuff? that's weird because the build process went all just fine and unit tests are passing ... /me confused? -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From travis at continuum.io Wed May 9 12:46:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 9 May 2012 11:46:53 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments Message-ID: Hey all, Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. The document is available here: https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations. I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). I would like one more discussion thread where the technical discussion can take place. I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else. I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start. So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: My three proposals: * do nothing and leave things as is * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) * move Mark's "masked ndarray objects" into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked. Ideally, numpy.ma would be changed to use ndmasked objects as their core. For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). Best regards, -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed May 9 12:49:38 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2012 10:49:38 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 10:36 AM, Sandro Tosi wrote: > On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers > wrote: > > Hi, > > > > I'm pleased to announce the availability of the first release candidate > of > > NumPy 1.6.2. This is a maintenance release. Due to the delay of the > NumPy > > 1.7.0, this release contains far more fixes than a regular NumPy bugfix > > release. It also includes a number of documentation and build > improvements. > > > > Sources and binary installers can be found at > > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > > > Please test this release and report any issues on the numpy-discussion > > mailing list. > > Mh, I can't exactly understand this: > > $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1 > 2718 files changed, 390859 deletions(-) > > does it mean that the only thing the RC has done is to remove a lot of > stuff? that's weird because the build process went all just fine and > unit tests are passing ... /me confused? > > No, only a few files were changed. Since there are about 1000 files in numpy I suspect you are also counting everything in the build and documentation build directories. If you built inplace, you are also going to pick up *.pyc files and such. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrixhasu at gmail.com Wed May 9 12:53:25 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Wed, 9 May 2012 18:53:25 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 6:49 PM, Charles R Harris wrote: > > > On Wed, May 9, 2012 at 10:36 AM, Sandro Tosi wrote: >> >> On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers >> wrote: >> > Hi, >> > >> > I'm pleased to announce the availability of the first release candidate >> > of >> > NumPy 1.6.2.? This is a maintenance release. Due to the delay of the >> > NumPy >> > 1.7.0, this release contains far more fixes than a regular NumPy bugfix >> > release.? It also includes a number of documentation and build >> > improvements. >> > >> > Sources and binary installers can be found at >> > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ >> > >> > Please test this release and report any issues on the numpy-discussion >> > mailing list. >> >> Mh, I can't exactly understand this: >> >> $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1 >> ?2718 files changed, 390859 deletions(-) >> >> does it mean that the only thing the RC has done is to remove a lot of >> stuff? that's weird because the build process went all just fine and >> unit tests are passing ... /me confused? >> > > No, only a few files were changed. Since there are about 1000 files in numpy > I suspect you are also counting everything in the build and documentation > build directories. If you built inplace, you are also going to pick up *.pyc > files and such. sorry i didn't say that: they are the tarballs just extracted. i'd have to recheck again downloading from SF -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From klonuo at gmail.com Wed May 9 12:55:12 2012 From: klonuo at gmail.com (klo uo) Date: Wed, 9 May 2012 18:55:12 +0200 Subject: [Numpy-discussion] [SciPy-Dev] Announce: scikit-learn v0.11 In-Reply-To: <20120507231317.GO19857@phare.normalesup.org> References: <20120507231317.GO19857@phare.normalesup.org> Message-ID: This news did not arrive at scikit-learn-general at lists.sourceforge.net Is above list deprecated? BTW thanks for supporting and working on this project ;) On Tue, May 8, 2012 at 1:13 AM, Gael Varoquaux wrote: > ? On behalf of Andy Mueller, our release manager, I am happy to announce > ? the 0.11 release of scikit-learn. > > ? This release includes some major new features such as randomized > ? sparse models, gradient boosted regression trees, label propagation > ? and many more. The release also has major improvements in the > ? documentation and in stability. > > ? Details can be found on the [1]what's new page. > > ? We also have a new page with [2]video tutorials on machine learning > ? with scikit-learn and different aspects of the package. > > ? Sources and windows binaries are available on sourceforge, > ? through pypi (http://pypi.python.org/pypi/scikit-learn/0.11) or > ? can be installed directly using pip: > > ? pip install -U scikit-learn > > ? Thanks again to all the contributors who made this release possible. > > ? Cheers, > > ? ?Ga?l > > ? 1. http://scikit-learn.org/stable/whats_new.html > ? 2. http://scikit-learn.org/stable/presentations.html > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matrixhasu at gmail.com Wed May 9 13:02:27 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Wed, 9 May 2012 19:02:27 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 6:53 PM, Sandro Tosi wrote: > On Wed, May 9, 2012 at 6:49 PM, Charles R Harris > wrote: >> >> >> On Wed, May 9, 2012 at 10:36 AM, Sandro Tosi wrote: >>> >>> On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers >>> wrote: >>> > Hi, >>> > >>> > I'm pleased to announce the availability of the first release candidate >>> > of >>> > NumPy 1.6.2.? This is a maintenance release. Due to the delay of the >>> > NumPy >>> > 1.7.0, this release contains far more fixes than a regular NumPy bugfix >>> > release.? It also includes a number of documentation and build >>> > improvements. >>> > >>> > Sources and binary installers can be found at >>> > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ >>> > >>> > Please test this release and report any issues on the numpy-discussion >>> > mailing list. >>> >>> Mh, I can't exactly understand this: >>> >>> $ diff -urNad numpy-1.6.1 numpy-1.6.2rc | diffstat | tail -1 >>> ?2718 files changed, 390859 deletions(-) >>> >>> does it mean that the only thing the RC has done is to remove a lot of >>> stuff? that's weird because the build process went all just fine and >>> unit tests are passing ... /me confused? >>> >> >> No, only a few files were changed. Since there are about 1000 files in numpy >> I suspect you are also counting everything in the build and documentation >> build directories. If you built inplace, you are also going to pick up *.pyc >> files and such. > > sorry i didn't say that: they are the tarballs just extracted. i'd > have to recheck again downloading from SF gaah sorry for the noise, there was some devils playing tricks on me.. now I got a more sane 608 files changed, 8121 insertions(+), 4300 deletions(-) Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From charlesr.harris at gmail.com Wed May 9 13:08:26 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2012 11:08:26 -0600 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 10:46 AM, Travis Oliphant wrote: > Hey all, > > Nathaniel and Mark have worked very hard on a joint document to try and > explain the current status of the missing-data debate. I think they've > done an amazing job at providing some context, articulating their views and > suggesting ways forward in a mutually respectful manner. This is an > exemplary collaboration and is at the core of why open source is valuable. > > The document is available here: > https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst > > After reading that document, it appears to me that there are some > fundamentally different views on how things should move forward. I'm also > reading the document incorporating my understanding of the history, of > NumPy as well as all of the users I've met and interacted with which means > I have my own perspective that is not necessarily incorporated into that > document but informs my recommendations. I'm not sure we can reach full > consensus on this. We are also well past time for moving forward with a > resolution on this (perhaps we can all agree on that). > > I would like one more discussion thread where the technical discussion can > take place. I will make a plea that we keep this discussion as free from > logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. > I can't guarantee that I personally will succeed at that, but I can tell > you that I will try. That's all I'm asking of anyone else. I recognize > that there are a lot of other issues at play here besides *just* the > technical questions, but we are not going to resolve every community issue > in this technical thread. > > We need concrete proposals and so I will start with three. Please feel > free to comment on these proposals or add your own during the discussion. > I will stop paying attention to this thread next Wednesday (May 16th) (or > earlier if the thread dies) and hope that by that time we can agree on a > way forward. If we don't have agreement, then I will move forward with > what I think is the right approach. I will either write the code myself > or convince someone else to write it. > > In all cases, we have agreement that bit-pattern dtypes should be added to > NumPy. We should work on these (int32, float64, complex64, str, bool) > to start. So, the three proposals are independent of this way forward. > The proposals are all about the extra mask part: > > My three proposals: > > * do nothing and leave things as is > > * add a global flag that turns off masked array support by default but > otherwise leaves things unchanged (I'm still unclear how this would work > exactly) > > * move Mark's "masked ndarray objects" into a new fundamental type > (ndmasked), leaving the actual ndarray type unchanged. The array_interface > keeps the masked array notions and the ufuncs keep the ability to handle > arrays like ndmasked. Ideally, numpy.ma would be changed to use > ndmasked objects as their core. > > The numpy.ma is unmaintained and I don't see that changing anytime soon. As you know, I would prefer 1), but 2) is a good compromise and the infra structure for such a flag could be useful for other things, although like yourself I'm not sure how it would be implemented. I don't understand your proposal for 3), but from the description I don't see that it buys anything. > For the record, I'm currently in favor of the third proposal. Feel free > to comment on these proposals (or provide your own). > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrixhasu at gmail.com Wed May 9 14:40:28 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Wed, 9 May 2012 20:40:28 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers wrote: > Please test this release and report any issues on the numpy-discussion > mailing list. I think it's probably nice not to ship pyc in the source tarball: $ find numpy-1.6.2rc1/ -name "*.pyc" numpy-1.6.2rc1/doc/sphinxext/docscrape.pyc numpy-1.6.2rc1/doc/sphinxext/docscrape_sphinx.pyc numpy-1.6.2rc1/doc/sphinxext/numpydoc.pyc numpy-1.6.2rc1/doc/sphinxext/plot_directive.pyc Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From mwwiebe at gmail.com Wed May 9 15:07:37 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 9 May 2012 14:07:37 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant wrote: > Hey all, > > Nathaniel and Mark have worked very hard on a joint document to try and > explain the current status of the missing-data debate. I think they've > done an amazing job at providing some context, articulating their views and > suggesting ways forward in a mutually respectful manner. This is an > exemplary collaboration and is at the core of why open source is valuable. > > The document is available here: > https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst > > After reading that document, it appears to me that there are some > fundamentally different views on how things should move forward. I'm also > reading the document incorporating my understanding of the history, of > NumPy as well as all of the users I've met and interacted with which means > I have my own perspective that is not necessarily incorporated into that > document but informs my recommendations. I'm not sure we can reach full > consensus on this. We are also well past time for moving forward with a > resolution on this (perhaps we can all agree on that). > > I would like one more discussion thread where the technical discussion can > take place. I will make a plea that we keep this discussion as free from > logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. > I can't guarantee that I personally will succeed at that, but I can tell > you that I will try. That's all I'm asking of anyone else. I recognize > that there are a lot of other issues at play here besides *just* the > technical questions, but we are not going to resolve every community issue > in this technical thread. > > We need concrete proposals and so I will start with three. Please feel > free to comment on these proposals or add your own during the discussion. > I will stop paying attention to this thread next Wednesday (May 16th) (or > earlier if the thread dies) and hope that by that time we can agree on a > way forward. If we don't have agreement, then I will move forward with > what I think is the right approach. I will either write the code myself > or convince someone else to write it. > > In all cases, we have agreement that bit-pattern dtypes should be added to > NumPy. We should work on these (int32, float64, complex64, str, bool) > to start. So, the three proposals are independent of this way forward. > The proposals are all about the extra mask part: > > My three proposals: > > * do nothing and leave things as is > > * add a global flag that turns off masked array support by default but > otherwise leaves things unchanged (I'm still unclear how this would work > exactly) > > * move Mark's "masked ndarray objects" into a new fundamental type > (ndmasked), leaving the actual ndarray type unchanged. The array_interface > keeps the masked array notions and the ufuncs keep the ability to handle > arrays like ndmasked. Ideally, numpy.ma would be changed to use > ndmasked objects as their core. > > For the record, I'm currently in favor of the third proposal. Feel free > to comment on these proposals (or provide your own). > I'm most in favour of the second proposal. It won't take very much effort, and more clearly marks off this code as experimental than just documentation notes. Thanks, -Mark > > Best regards, > > -Travis > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed May 9 15:15:40 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 9 May 2012 14:15:40 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: <1A346AD9-8567-4F3F-A03E-2E316A84B5C0@continuum.io> On May 9, 2012, at 2:07 PM, Mark Wiebe wrote: > On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant wrote: > Hey all, > > Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. > > The document is available here: > https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst > > After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations. I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). > > I would like one more discussion thread where the technical discussion can take place. I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else. I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. > > We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. > > In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start. So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: > > My three proposals: > > * do nothing and leave things as is > > * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) > > * move Mark's "masked ndarray objects" into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked. Ideally, numpy.ma would be changed to use ndmasked objects as their core. > > For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). > > I'm most in favour of the second proposal. It won't take very much effort, and more clearly marks off this code as experimental than just documentation notes. > Mark will you give more details about this proposal? How would the flag work, what would it modify? The proposal to create a ndmasked object that is separate from ndarray objects also won't take much effort and also marks off the object so those who want to use it can and those who don't are not pushed into using it anyway. -Travis > Thanks, > -Mark > > > Best regards, > > -Travis > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed May 9 15:35:26 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 9 May 2012 14:35:26 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: <1A346AD9-8567-4F3F-A03E-2E316A84B5C0@continuum.io> References: <1A346AD9-8567-4F3F-A03E-2E316A84B5C0@continuum.io> Message-ID: On Wed, May 9, 2012 at 2:15 PM, Travis Oliphant wrote: > > On May 9, 2012, at 2:07 PM, Mark Wiebe wrote: > > On Wed, May 9, 2012 at 11:46 AM, Travis Oliphant wrote: > >> Hey all, >> >> Nathaniel and Mark have worked very hard on a joint document to try and >> explain the current status of the missing-data debate. I think they've >> done an amazing job at providing some context, articulating their views and >> suggesting ways forward in a mutually respectful manner. This is an >> exemplary collaboration and is at the core of why open source is valuable. >> >> The document is available here: >> https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst >> >> After reading that document, it appears to me that there are some >> fundamentally different views on how things should move forward. I'm also >> reading the document incorporating my understanding of the history, of >> NumPy as well as all of the users I've met and interacted with which means >> I have my own perspective that is not necessarily incorporated into that >> document but informs my recommendations. I'm not sure we can reach full >> consensus on this. We are also well past time for moving forward with a >> resolution on this (perhaps we can all agree on that). >> >> I would like one more discussion thread where the technical discussion >> can take place. I will make a plea that we keep this discussion as free >> from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as >> we can. I can't guarantee that I personally will succeed at that, but I >> can tell you that I will try. That's all I'm asking of anyone else. I >> recognize that there are a lot of other issues at play here besides *just* >> the technical questions, but we are not going to resolve every community >> issue in this technical thread. >> >> We need concrete proposals and so I will start with three. Please feel >> free to comment on these proposals or add your own during the discussion. >> I will stop paying attention to this thread next Wednesday (May 16th) (or >> earlier if the thread dies) and hope that by that time we can agree on a >> way forward. If we don't have agreement, then I will move forward with >> what I think is the right approach. I will either write the code myself >> or convince someone else to write it. >> >> In all cases, we have agreement that bit-pattern dtypes should be added >> to NumPy. We should work on these (int32, float64, complex64, str, >> bool) to start. So, the three proposals are independent of this way >> forward. The proposals are all about the extra mask part: >> >> My three proposals: >> >> * do nothing and leave things as is >> >> * add a global flag that turns off masked array support by default but >> otherwise leaves things unchanged (I'm still unclear how this would work >> exactly) >> >> * move Mark's "masked ndarray objects" into a new fundamental type >> (ndmasked), leaving the actual ndarray type unchanged. The array_interface >> keeps the masked array notions and the ufuncs keep the ability to handle >> arrays like ndmasked. Ideally, numpy.ma would be changed to use >> ndmasked objects as their core. >> >> For the record, I'm currently in favor of the third proposal. Feel free >> to comment on these proposals (or provide your own). >> > > I'm most in favour of the second proposal. It won't take very much effort, > and more clearly marks off this code as experimental than just > documentation notes. > > > Mark will you give more details about this proposal? How would the flag > work, what would it modify? > The idea is inspired in part by the Chrome release cycle, which has a presentation here: https://docs.google.com/present/view?id=dg63dpc6_4d7vkk6ch&pli=1 Some quotes: Features should be engineered so that they can be disabled easily (1 patch) and Would large feature development still be possible? "Yes, engineers would have to work behind flags, however they can work for as many releases as they need to and can remove the flag when they are done." The current numpy codebase isn't designed for this kind of workflow, but I think we can productively emulate the idea for a big feature like NA support. One way to do this flag would be to have a "numpy.experimental" namespace which is not imported by default. To enable the NA-mask feature, you could do: >>> import numpy.experimental.maskna This would trigger an ExperimentalWarning to message that an experimental feature has been enabled, and would add any NA-specific symbols to the numpy namespace (NA, NAType, etc). Without this import, any operation which would create an NA or NA-masked array raises an ExperimentalError instead of succeeding. After this import, things would behave as they do now. Cheers, Mark The proposal to create a ndmasked object that is separate from ndarray > objects also won't take much effort and also marks off the object so those > who want to use it can and those who don't are not pushed into using it > anyway. > > -Travis > > > Thanks, > -Mark > > >> >> Best regards, >> >> -Travis >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed May 9 15:35:26 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 9 May 2012 14:35:26 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: > My three proposals: > > * do nothing and leave things as is > > * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) > > * move Mark's "masked ndarray objects" into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked. Ideally, numpy.ma would be changed to use ndmasked objects as their core. > > > The numpy.ma is unmaintained and I don't see that changing anytime soon. As you know, I would prefer 1), but 2) is a good compromise and the infra structure for such a flag could be useful for other things, although like yourself I'm not sure how it would be implemented. I don't understand your proposal for 3), but from the description I don't see that it buys anything. That is a bit strong to call numpy.ma unmaintained. I don't consider it that way. Are there a lot of tickets for it that are unaddressed? Is it broken? I know it gets a lot of use in the wild and so I don't think NumPy users would be happy to here it is considered unmaintained by NumPy developers. I'm looking forward to more details of Mark's proposal for #2. The proposal for #3 is quite simple and I think it is also a good compromise between removing the masked array entirely from the core NumPy object and leaving things as is in master. It keeps the functionality (but in a separate object) much like numpy.ma is a separate object. Basically it buys not forcing *all* NumPy users (on the C-API level) to now deal with a masked array. I know this push is a feature that is part of Mark's intention (as it pushes downstream libraries to think about missing data at a fundamental level). But, I think this is too big of a change to put in a 1.X release. The internal array-model used by NumPy is used quite extensively in downstream libraries as a *concept*. Many people have enhanced this model with a separate mask array for various reasons, and Mark's current use of mask does not satisfy all those use-cases. I don't see how we can justify changing the NumPy 1.X memory model under these circumstances. This is the sort of change that in my mind is a NumPy 2.0 kind of change where downstream users will be looking for possible array-model changes. -Travis > > For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). > > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed May 9 15:37:37 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 9 May 2012 14:37:37 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <1A346AD9-8567-4F3F-A03E-2E316A84B5C0@continuum.io> Message-ID: <750BBC1A-27B2-454E-883E-6E056E7E1C34@continuum.io> > Mark will you give more details about this proposal? How would the flag work, what would it modify? > > The idea is inspired in part by the Chrome release cycle, which has a presentation here: > > https://docs.google.com/present/view?id=dg63dpc6_4d7vkk6ch&pli=1 > > Some quotes: > Features should be engineered so that they can be disabled easily (1 patch) > and > Would large feature development still be possible? > > "Yes, engineers would have to work behind flags, however they can work for as many releases as they need to and can remove the flag when they are done." > > The current numpy codebase isn't designed for this kind of workflow, but I think we can productively emulate the idea for a big feature like NA support. > > One way to do this flag would be to have a "numpy.experimental" namespace which is not imported by default. To enable the NA-mask feature, you could do: > > >>> import numpy.experimental.maskna > > This would trigger an ExperimentalWarning to message that an experimental feature has been enabled, and would add any NA-specific symbols to the numpy namespace (NA, NAType, etc). Without this import, any operation which would create an NA or NA-masked array raises an ExperimentalError instead of succeeding. After this import, things would behave as they do now. How would this flag work at the C-API level? -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed May 9 15:44:24 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 09 May 2012 21:44:24 +0200 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: <4FAAC918.8050705@astro.uio.no> On 05/09/2012 06:46 PM, Travis Oliphant wrote: > Hey all, > > Nathaniel and Mark have worked very hard on a joint document to try and > explain the current status of the missing-data debate. I think they've > done an amazing job at providing some context, articulating their views > and suggesting ways forward in a mutually respectful manner. This is an > exemplary collaboration and is at the core of why open source is valuable. > > The document is available here: > https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst > > After reading that document, it appears to me that there are some > fundamentally different views on how things should move forward. I'm > also reading the document incorporating my understanding of the history, > of NumPy as well as all of the users I've met and interacted with which > means I have my own perspective that is not necessarily incorporated > into that document but informs my recommendations. I'm not sure we can > reach full consensus on this. We are also well past time for moving > forward with a resolution on this (perhaps we can all agree on that). > > I would like one more discussion thread where the technical discussion > can take place. I will make a plea that we keep this discussion as free > from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as > we can. I can't guarantee that I personally will succeed at that, but I > can tell you that I will try. That's all I'm asking of anyone else. I > recognize that there are a lot of other issues at play here besides > *just* the technical questions, but we are not going to resolve every > community issue in this technical thread. > > We need concrete proposals and so I will start with three. Please feel > free to comment on these proposals or add your own during the > discussion. I will stop paying attention to this thread next Wednesday > (May 16th) (or earlier if the thread dies) and hope that by that time we > can agree on a way forward. If we don't have agreement, then I will move > forward with what I think is the right approach. I will either write the > code myself or convince someone else to write it. > > In all cases, we have agreement that bit-pattern dtypes should be added > to NumPy. We should work on these (int32, float64, complex64, str, bool) > to start. So, the three proposals are independent of this way forward. > The proposals are all about the extra mask part: > > My three proposals: > > * do nothing and leave things as is > > * add a global flag that turns off masked array support by default but > otherwise leaves things unchanged (I'm still unclear how this would work > exactly) > > * move Mark's "masked ndarray objects" into a new fundamental type > (ndmasked), leaving the actual ndarray type unchanged. The > array_interface keeps the masked array notions and the ufuncs keep the > ability to handle arrays like ndmasked. Ideally, numpy.ma > would be changed to use ndmasked objects as their core. > > For the record, I'm currently in favor of the third proposal. Feel free > to comment on these proposals (or provide your own). > Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark! The third proposal is certainly the best one from Cython's perspective; and I imagine for those writing C extensions against the C API too. Having PyType_Check fail for ndmasked is a very good way of having code fail that is not written to take masks into account. If it is in ndarray we would also have some pressure to add support in Cython, with ndmasked we avoid that too. Likely outcome is we won't ever support it either way, but then we need some big warning in the docs, and it's better to avoid that. (I guess be +0 on Mark Florisson implementing it if it ends up in core ndarray; I'd almost certainly not do it myself.) That covers Cython. My view as a NumPy user follows. I'm a heavy user of masks, which are used to make data NA in the statistical sense. The setting is that we have to mask out the radiation coming from the Milky Way in full-sky images of the Cosmic Microwave Background. There's data, but we know we can't trust it, so we make it NA. But we also do play around with different masks. Today we keep the mask in a seperate array, and to zero-mask we do masked_data = data * mask or masked_data = data.copy() masked_data[mask == 0] = np.nan # soon np.NA depending on the circumstances. Honestly, API-wise, this is as good as its gets for us. Nice and transparent, no new semantics to learn in the special case of masks. Now, this has performance issues: Lots of memory use, extra transfers over the memory bus. BUT, NumPy has that problem all over the place, even for "x + y + z"! Solving it in the special case of masks, by making a new API, seems a bit myopic to me. IMO, that's much better solved at the fundamental level. As an *illustration*: with np.lazy: masked_data1 = data * mask1 masked_data2 = data * (mask1 | mask2) masked_data3 = (x + y + z) * (mask1 & mask3) This would create three "generator arrays" that would zero-mask the arrays (and perform the three-term addition...) upon request. You could slice the generator arrays as you wish, and by that slice the data and the mask in one operation. Obviously this could handle NA-masking too. You can probably do this today with Theano and numexpr, and I think Travis mentioned that "generator arrays" are on his radar for core NumPy. Point is, as a user, I'm with Travis in having masks support go hide in ndmasked; they solve too much of a special case in a way that is too particular. Dag From charlesr.harris at gmail.com Wed May 9 16:06:27 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2012 14:06:27 -0600 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 1:35 PM, Travis Oliphant wrote: > My three proposals: >> >> * do nothing and leave things as is >> >> * add a global flag that turns off masked array support by default but >> otherwise leaves things unchanged (I'm still unclear how this would work >> exactly) >> >> * move Mark's "masked ndarray objects" into a new fundamental type >> (ndmasked), leaving the actual ndarray type unchanged. The array_interface >> keeps the masked array notions and the ufuncs keep the ability to handle >> arrays like ndmasked. Ideally, numpy.ma would be changed to use >> ndmasked objects as their core. >> >> > The numpy.ma is unmaintained and I don't see that changing anytime soon. > As you know, I would prefer 1), but 2) is a good compromise and the infra > structure for such a flag could be useful for other things, although like > yourself I'm not sure how it would be implemented. I don't understand your > proposal for 3), but from the description I don't see that it buys anything. > > > That is a bit strong to call numpy.ma unmaintained. I don't consider > it that way. Are there a lot of tickets for it that are unaddressed? > Is it broken? I know it gets a lot of use in the wild and so I don't > think NumPy users would be happy to here it is considered unmaintained by > NumPy developers. > > I'm looking forward to more details of Mark's proposal for #2. > > The proposal for #3 is quite simple and I think it is also a good > compromise between removing the masked array entirely from the core NumPy > object and leaving things as is in master. It keeps the functionality (but > in a separate object) much like numpy.ma is a separate object. > Basically it buys not forcing *all* NumPy users (on the C-API level) to > now deal with a masked array. > To me, it looks like we will get stuck with a more complicated implementation without changing the API, something that 2) achieves more easily while providing a feature likely to be useful as we head towards 2.0. > I know this push is a feature that is part of Mark's intention (as it > pushes downstream libraries to think about missing data at a fundamental > level). But, I think this is too big of a change to put in a 1.X > release. The internal array-model used by NumPy is used quite extensively > in downstream libraries as a *concept*. Many people have enhanced this > model with a separate mask array for various reasons, and Mark's current > use of mask does not satisfy all those use-cases. I don't see how we can > justify changing the NumPy 1.X memory model under these circumstances. > > You keep referring to these ghostly people and their unspecified uses, no doubt to protect the guilty. You don't have to name names, but a little detail on what they have done and how they use things would be *very* helpful. > This is the sort of change that in my mind is a NumPy 2.0 kind of change > where downstream users will be looking for possible array-model changes. > > We tried the flag day approach to 2.0 already and it failed. I think it better to have a long term release and a series of releases thereafter moving step by step with incremental changes towards a 2.0. Mark's 2) would support that approach. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jh at physics.ucf.edu Wed May 9 17:36:17 2012 From: jh at physics.ucf.edu (Joe Harrington) Date: Wed, 09 May 2012 17:36:17 -0400 Subject: [Numpy-discussion] Documentation roles in the numpy/scipy documentation editor In-Reply-To: (numpy-discussion-request@scipy.org) Message-ID: We considered lowering the review standard near the end of my direct involvement in the doc project but decided not to. You didn't mention any benefit to the proposed changes, so while I'm not active in the doc project anymore, let me relate our decision. It's often the case that docstrings get written fast, and it's usually the case that they're written by a single person, who has a single perspective. We wanted to make docs that were professional, that could be placed next to the manuals for IDL, Matlab, etc. without embarrassment. So, we set up a system similar to academic publishing. Every docstring would be seen by two sets of critical eyes, and for major X.0 releases we'd pay a proofreader to spend a few days to polish off the English and get the style totally consistent. At the same time, we needed to get something decent in every docstring fast, so we made that the priority. About the time we achieved that, money ran out. So, lots of docstrings are in "needs review" or even "being edited" status. But that doesn't mean money will never come again. Indeed, there are now several companies basing their services around this software. If someone does want to make the docs professional, say for numpy 2.0 or 3.0 or whatever, or as part of a larger system for sale, then they have a system in place that can do it. The purpose of the review statuses is to identify how close a docstring is to publishable. However, there is no consequence to the statuses: a docstring gets included in the release no matter its status. But, you do know which docstrings need what kind of work. So, what's the benefit of changing what the statuses mean, or eliminating them? I think it may only be that the writers feel better. The users don't even see the statuses as they're not listed in the release. Tim felt that docs should be continually edited, not "finished". I agree, especially if the underlying routine or surrounding docs get changed. But the system is designed to encourage this! Here's how: Say most/all routines get genuine "proofed" status. That's great, but it's not the end of the line by any means. If someone comes along and edits a "proofed" docstring, that docstring then automatically "needs review" once again, to ensure that a mistake was not inserted. Now you know what to look at when checking things over before a release (since there can't be unit tests for docs). From the history, you also know it was once proofed, so reviewing and proofing it is very easy just by looking at the diffs. So, the system encourages and accounts for continual edits while allowing a professional product to be produced for a particular release. The way to move forward is to declare that the goal is to get all docs to some status, say "needs review" (that was our initial goal, and the only one we achieved, more or less). Then, go after the docs that don't have that, like the new polynomial docs. If someone wants to publish a manual, the goal becomes "proofed", and there's more work to do. It DOES make sense to give the reviewer role to more people. Just make sure they take care in their reviews, so the statuses continue to have meaning. Otherwise what's the point? --jh-- On Mon, 7 May 2012 22:14:56, Ralf Gommers wrote: On Mon, May 7, 2012 at 7:37 PM, Tim Cera wrote: >> I think we should change the roles established for the Numpy/Scipy >> documentation editors because they do not work as intended. >> >> For reference they are described here: >> http://docs.scipy.org/numpy/Front%20Page/ >> >> Basically there aren't that many active people to support being split into >> the roles as described which has led to a backlog of 'Needs review' >> docstrings and only one 'Proofed' docstring. I think that many of these >> docstrings are good enough, just that not enough people have put themselves >> out front as so knowledgeable about a certain topic to label docstrings as >> 'Reviewed' or 'Proofed'. >> >> You're right. I think at some point the goal shifted from getting >everything to "proofed" to getting everything to "needs review". > > >> Here are the current statistics for numpy docstrings: >> Current %Count Needs editing17 279 Being written / Changed4 62 Needs >> review76 1235 Needs review (revised)2 35 Needs work (reviewed)0 3Reviewed (needs proof) >> 0 0 Proofed0 1 Unimportant? 1793 >> >> The "needs editing" category actually contains mostly docstrings that are >quite good, but were recently created and never edited in the doc wiki. The >% keeps on growing. Bumping all polynomial docstrings up to "needs review" >would be a good start here to make the % reflect the actual status. > >> >> I have thought about some solutions in no particular order: >> >> * Get rid of the 'Reviewer' and 'Proofer' roles. >> * Assign all 'Editors', the 'Reviewer', and 'Proofer' privileges. >> * People start out as 'Editors', and then become 'Reviewers', and >> 'Proofers' based on some editing metric. >> >> For full disclosure, I would be generous with a 'Reviewed' label if given >> the authority because philosophically I think there should be a point where >> the docstring is 'Good enough' and it should be expected to have a life of >> continually small improvements rather that a point when it is 'Done'. >> > >This makes sense to me. > > >> Regardless of what decision is made, the single 'Proofed' docstring should >> be available for editing. I can't even find what it is. I imagine that it >> should be on the docstring page at http://docs.scipy.org/numpy/docs/ >> >> It used to be there - maybe the stats got confused. From travis at continuum.io Wed May 9 18:12:05 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 9 May 2012 17:12:05 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On re-reading, I want to make a couple of things clear: 1) This "wrap-up" discussion is *only* for what to do for NumPy 1.7 in such a way that we don't tie our hands in the future. I do not believe we can figure out what to do for masked arrays in one short week. What happens beyond NumPy 1.7 should be still discussed and explored. My urgency is entirely about moving forward from where we are in master right now in a direction that we can all accept. The tight timeline is so that we do *something* and move forward. 2) I missed another possible proposal for NumPy 1.7 which is in the write-up that Mark and Nathaniel made: remove the masked array additions entirely possibly moving them to another module like numpy-dtypes. Again, these are only for NumPy 1.7. What happens in any future NumPy and beyond will depend on who comes to the table for both discussion and code-development. Best regards, -Travis On May 9, 2012, at 11:46 AM, Travis Oliphant wrote: > Hey all, > > Nathaniel and Mark have worked very hard on a joint document to try and explain the current status of the missing-data debate. I think they've done an amazing job at providing some context, articulating their views and suggesting ways forward in a mutually respectful manner. This is an exemplary collaboration and is at the core of why open source is valuable. > > The document is available here: > https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst > > After reading that document, it appears to me that there are some fundamentally different views on how things should move forward. I'm also reading the document incorporating my understanding of the history, of NumPy as well as all of the users I've met and interacted with which means I have my own perspective that is not necessarily incorporated into that document but informs my recommendations. I'm not sure we can reach full consensus on this. We are also well past time for moving forward with a resolution on this (perhaps we can all agree on that). > > I would like one more discussion thread where the technical discussion can take place. I will make a plea that we keep this discussion as free from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as we can. I can't guarantee that I personally will succeed at that, but I can tell you that I will try. That's all I'm asking of anyone else. I recognize that there are a lot of other issues at play here besides *just* the technical questions, but we are not going to resolve every community issue in this technical thread. > > We need concrete proposals and so I will start with three. Please feel free to comment on these proposals or add your own during the discussion. I will stop paying attention to this thread next Wednesday (May 16th) (or earlier if the thread dies) and hope that by that time we can agree on a way forward. If we don't have agreement, then I will move forward with what I think is the right approach. I will either write the code myself or convince someone else to write it. > > In all cases, we have agreement that bit-pattern dtypes should be added to NumPy. We should work on these (int32, float64, complex64, str, bool) to start. So, the three proposals are independent of this way forward. The proposals are all about the extra mask part: > > My three proposals: > > * do nothing and leave things as is > > * add a global flag that turns off masked array support by default but otherwise leaves things unchanged (I'm still unclear how this would work exactly) > > * move Mark's "masked ndarray objects" into a new fundamental type (ndmasked), leaving the actual ndarray type unchanged. The array_interface keeps the masked array notions and the ufuncs keep the ability to handle arrays like ndmasked. Ideally, numpy.ma would be changed to use ndmasked objects as their core. > > For the record, I'm currently in favor of the third proposal. Feel free to comment on these proposals (or provide your own). > > Best regards, > > -Travis > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed May 9 18:33:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2012 16:33:49 -0600 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 4:12 PM, Travis Oliphant wrote: > On re-reading, I want to make a couple of things clear: > > 1) This "wrap-up" discussion is *only* for what to do for NumPy 1.7 in > such a way that we don't tie our hands in the future. I do not believe > we can figure out what to do for masked arrays in one short week. What > happens beyond NumPy 1.7 should be still discussed and explored. My > urgency is entirely about moving forward from where we are in master right > now in a direction that we can all accept. The tight timeline is so > that we do *something* and move forward. > > 2) I missed another possible proposal for NumPy 1.7 which is in the > write-up that Mark and Nathaniel made: remove the masked array additions > entirely possibly moving them to another module like numpy-dtypes. > > Again, these are only for NumPy 1.7. What happens in any future NumPy > and beyond will depend on who comes to the table for both discussion and > code-development. > > Why don't we go with 2) then? Mark implies that it takes the least work and it kicks the decision down the road. It may well be that a better approach turns up after more discussion, or that we decide to just pull it out, but the first takes time to arrive at and the second takes effort that could be better spent (IMHO) on other things at the moment. My sense is that the API is actually the major point of contention, although I may just be speaking for myself. And perhaps we should look for ways of adding support for masked array implementations rather than masked arrays themselves. It could be that easy to use infrastructure that enhanced others efforts might be a better way forward. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed May 9 18:55:44 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 May 2012 23:55:44 +0100 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 5:46 PM, Travis Oliphant wrote: > Hey all, > > Nathaniel and Mark have worked very hard on a joint document to try and > explain the current status of the missing-data debate. ? I think they've > done an amazing job at providing some context, articulating their views and > suggesting ways forward in a mutually respectful manner. ? This is an > exemplary collaboration and is at the core of why open source is valuable. > > The document is available here: > ? ?https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst > > After reading that document, it appears to me that there are some > fundamentally different views on how things should move forward. ? I'm also > reading the document incorporating my understanding of the history, of NumPy > as well as all of the users I've met and interacted with which means I have > my own perspective that is not necessarily incorporated into that document > but informs my recommendations. ? ?I'm not sure we can reach full consensus > on this. ? ? We are also well past time for moving forward with a resolution > on this (perhaps we can all agree on that). If we're talking about deciding what to do for the 1.7 release branch, then I agree. Otherwise, I definitely don't. We really just don't *know* what our users need with regards to mask-based storage versions of missing data, so committing to something within a short time period will just guarantee we have to re-do it all again later. [Edit: I see that you've clarified this in a follow-up email -- great!] > We need concrete proposals and so I will start with three. ? Please feel > free to comment on these proposals or add your own during the discussion. > ?I will stop paying attention to this thread next Wednesday (May 16th) (or > earlier if the thread dies) and hope that by that time we can agree on a way > forward. ?If we don't have agreement, then I will move forward with what I > think is the right approach. ? I will either write the code myself or > convince someone else to write it. Again, I'm assuming that what you mean here is that we can't and shouldn't delay 1.7 indefinitely for this discussion to play out, so you're proposing that we give ourselves a deadline of 1 week to decide how to at least get the release unblocked. Let me know if I'm misreading, though... > In all cases, we have agreement that bit-pattern dtypes should be added to > NumPy. ? ? ?We should work on these (int32, float64, complex64, str, bool) > to start. ? ?So, the three proposals are independent of this way forward. > The proposals are all about the extra mask part: > > My three proposals: > > * do nothing and leave things as is In the context of 1.7, this seems like a non-starter at this point, at least if we're going to move in the direction of making decisions by consensus. It might well be that we'll decide that the current NEP-like API is what we want (or that some compatible super-set is). But (as described in more detail in the NA-overview document), I think there are still serious questions to work out about how and whether a masked-storage/NA-semantics API is something we want as part of the ndarray object at all. And Ralf with his release-manager hat says that he doesn't want to release the current API unless we can guarantee that some version of it will continue to be supported. To me that suggests that this is off the table for 1.7. > * add a global flag that turns off masked array support by default but > otherwise leaves things unchanged (I'm still unclear how this would work > exactly) I've been assuming something like a global variable, and some guards added to all the top-level functions that take "maskna=" arguments, so that it's impossible to construct an ndarray that has its "maskna" flag set to True unless the flag has been toggled. As I said in NA-overview, I'd be fine with this in principle, but only if we're certain we're okay with the ABI consequences. And we should be clear on the goal -- if we just want to let people play with the API, then there are other options, such as my little experiment: https://github.com/njsmith/numpyNEP (This is certainly less robust, but it works, and is probably a much easier base for modifications to test alternative APIs.) If the goal is just to keep the code in master, then that's fine too, though it has both costs and benefits. (An example of a cost is that its presence may complicate adding bitpattern NA support.) > * move Mark's "masked ndarray objects" into a new fundamental type > (ndmasked), leaving the actual ndarray type unchanged. ?The array_interface > keeps the masked array notions and the ufuncs keep the ability to handle > arrays like ndmasked. ? ?Ideally, numpy.ma would be changed to use ndmasked > objects as their core. If we're talking about 1.7, then what kind of status do you propose these new objects would have in 1.7? Regular feature, totally experimental, something else? My only objection to this proposal is that committing to this approach seems premature. The existing masked array objects act quite differently from numpy.ma, so why do you believe that they're a good foundation for numpy.ma, and why will users want to switch to their semantics over numpy.ma's semantics? These aren't rhetorical questions, it seems like they must have concrete answers, but I don't know what they are. Cheers, - Nathaniel From matthew.brett at gmail.com Wed May 9 19:01:25 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 9 May 2012 16:01:25 -0700 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: <4FAAC918.8050705@astro.uio.no> References: <4FAAC918.8050705@astro.uio.no> Message-ID: Hi, On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn wrote: > On 05/09/2012 06:46 PM, Travis Oliphant wrote: >> Hey all, >> >> Nathaniel and Mark have worked very hard on a joint document to try and >> explain the current status of the missing-data debate. I think they've >> done an amazing job at providing some context, articulating their views >> and suggesting ways forward in a mutually respectful manner. This is an >> exemplary collaboration and is at the core of why open source is valuable. >> >> The document is available here: >> https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst >> >> After reading that document, it appears to me that there are some >> fundamentally different views on how things should move forward. I'm >> also reading the document incorporating my understanding of the history, >> of NumPy as well as all of the users I've met and interacted with which >> means I have my own perspective that is not necessarily incorporated >> into that document but informs my recommendations. I'm not sure we can >> reach full consensus on this. We are also well past time for moving >> forward with a resolution on this (perhaps we can all agree on that). >> >> I would like one more discussion thread where the technical discussion >> can take place. I will make a plea that we keep this discussion as free >> from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as >> we can. I can't guarantee that I personally will succeed at that, but I >> can tell you that I will try. That's all I'm asking of anyone else. I >> recognize that there are a lot of other issues at play here besides >> *just* the technical questions, but we are not going to resolve every >> community issue in this technical thread. >> >> We need concrete proposals and so I will start with three. Please feel >> free to comment on these proposals or add your own during the >> discussion. I will stop paying attention to this thread next Wednesday >> (May 16th) (or earlier if the thread dies) and hope that by that time we >> can agree on a way forward. If we don't have agreement, then I will move >> forward with what I think is the right approach. I will either write the >> code myself or convince someone else to write it. >> >> In all cases, we have agreement that bit-pattern dtypes should be added >> to NumPy. We should work on these (int32, float64, complex64, str, bool) >> to start. So, the three proposals are independent of this way forward. >> The proposals are all about the extra mask part: >> >> My three proposals: >> >> * do nothing and leave things as is >> >> * add a global flag that turns off masked array support by default but >> otherwise leaves things unchanged (I'm still unclear how this would work >> exactly) >> >> * move Mark's "masked ndarray objects" into a new fundamental type >> (ndmasked), leaving the actual ndarray type unchanged. The >> array_interface keeps the masked array notions and the ufuncs keep the >> ability to handle arrays like ndmasked. Ideally, numpy.ma >> would be changed to use ndmasked objects as their core. >> >> For the record, I'm currently in favor of the third proposal. Feel free >> to comment on these proposals (or provide your own). >> > > Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark! Yes, it is very well written, my compliments to the chefs. > The third proposal is certainly the best one from Cython's perspective; > and I imagine for those writing C extensions against the C API too. > Having PyType_Check fail for ndmasked is a very good way of having code > fail that is not written to take masks into account. Mark, Nathaniel - can you comment how your chosen approaches would interact with extension code? I'm guessing the bitpattern dtypes would be expected to cause extension code to choke if the type is not supported? Mark - in : https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython - do I understand correctly that you think that Cython and other extension writers should use the numpy API to access the data rather than accessing it directly via the data pointer and strides? Best, Matthew From njs at pobox.com Wed May 9 19:08:55 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 10 May 2012 00:08:55 +0100 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: <4FAAC918.8050705@astro.uio.no> References: <4FAAC918.8050705@astro.uio.no> Message-ID: Hi Dag, On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn wrote: > I'm a heavy user of masks, which are used to make data NA in the > statistical sense. The setting is that we have to mask out the radiation > coming from the Milky Way in full-sky images of the Cosmic Microwave > Background. There's data, but we know we can't trust it, so we make it > NA. But we also do play around with different masks. Oh, this is great -- that means you're one of the users that I wasn't sure existed or not :-). Now I know! > Today we keep the mask in a seperate array, and to zero-mask we do > > masked_data = data * mask > > or > > masked_data = data.copy() > masked_data[mask == 0] = np.nan # soon np.NA > > depending on the circumstances. > > Honestly, API-wise, this is as good as its gets for us. Nice and > transparent, no new semantics to learn in the special case of masks. > > Now, this has performance issues: Lots of memory use, extra transfers > over the memory bus. Right -- this is a case where (in the NA-overview terminology) masked storage+NA semantics would be useful. > BUT, NumPy has that problem all over the place, even for "x + y + z"! > Solving it in the special case of masks, by making a new API, seems a > bit myopic to me. > > IMO, that's much better solved at the fundamental level. As an > *illustration*: > > with np.lazy: > ? ? masked_data1 = data * mask1 > ? ? masked_data2 = data * (mask1 | mask2) > ? ? masked_data3 = (x + y + z) * (mask1 & mask3) > > This would create three "generator arrays" that would zero-mask the > arrays (and perform the three-term addition...) upon request. You could > slice the generator arrays as you wish, and by that slice the data and > the mask in one operation. Obviously this could handle NA-masking too. > > You can probably do this today with Theano and numexpr, and I think > Travis mentioned that "generator arrays" are on his radar for core NumPy. Implementing this today would require some black magic hacks, because on entry/exit to the context manager you'd have to "reach up" into the calling scope and replace all the ndarray's with LazyArrays and then vice-versa. This is actually totally possible: https://gist.github.com/2347382 but I'm not sure I'd call it *wise*. (You could probably avoid the truly horrible set_globals_dict part of that gist, though.) Might be fun to prototype, though... > Point is, as a user, I'm with Travis in having masks support go hide in > ndmasked; they solve too much of a special case in a way that is too > particular. Right, that's the concern. Hypothetical question: are you actually saying that if you had both bitpattern NAs and Travis' "ndmasked" object, you would still go ahead and use the bitpattern NAs for this case, because of the conceptual simplicity, easy of Cython/C compatibility, etc.? -- Nathaniel From gael.varoquaux at normalesup.org Wed May 9 19:21:01 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 10 May 2012 01:21:01 +0200 Subject: [Numpy-discussion] [SciPy-Dev] Announce: scikit-learn v0.11 In-Reply-To: References: <20120507231317.GO19857@phare.normalesup.org> Message-ID: <20120509232101.GC20435@phare.normalesup.org> On Wed, May 09, 2012 at 06:55:12PM +0200, klo uo wrote: > This news did not arrive at scikit-learn-general at lists.sourceforge.net > Is above list deprecated? Andy Mueller did the announcement on the scikit-learn mailing list. > BTW thanks for supporting and working on this project ;) Thank you very much, it is my pleasure. But it's really a team that you need to thank: the number of active contributors is huge. Cheers, Gael > On Tue, May 8, 2012 at 1:13 AM, Gael Varoquaux > wrote: > > ? On behalf of Andy Mueller, our release manager, I am happy to announce > > ? the 0.11 release of scikit-learn. > > ? This release includes some major new features such as randomized > > ? sparse models, gradient boosted regression trees, label propagation > > ? and many more. The release also has major improvements in the > > ? documentation and in stability. > > ? Details can be found on the [1]what's new page. > > ? We also have a new page with [2]video tutorials on machine learning > > ? with scikit-learn and different aspects of the package. > > ? Sources and windows binaries are available on sourceforge, > > ? through pypi (http://pypi.python.org/pypi/scikit-learn/0.11) or > > ? can be installed directly using pip: > > ? pip install -U scikit-learn > > ? Thanks again to all the contributors who made this release possible. > > ? Cheers, > > ? ?Ga?l > > ? 1. http://scikit-learn.org/stable/whats_new.html > > ? 2. http://scikit-learn.org/stable/presentations.html > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From pivanov314 at gmail.com Wed May 9 20:13:50 2012 From: pivanov314 at gmail.com (Paul Ivanov) Date: Wed, 9 May 2012 17:13:50 -0700 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 3:12 PM, Travis Oliphant wrote: > On re-reading, I want to make a couple of things clear: > > 1) This "wrap-up" discussion is *only* for what to do for NumPy 1.7 in > such a way that we don't tie our hands in the future. I do not believe > we can figure out what to do for masked arrays in one short week. What > happens beyond NumPy 1.7 should be still discussed and explored. My > urgency is entirely about moving forward from where we are in master right > now in a direction that we can all accept. The tight timeline is so > that we do *something* and move forward. > > 2) I missed another possible proposal for NumPy 1.7 which is in the > write-up that Mark and Nathaniel made: remove the masked array additions > entirely possibly moving them to another module like numpy-dtypes. > > Again, these are only for NumPy 1.7. What happens in any future NumPy > and beyond will depend on who comes to the table for both discussion and > code-development. > I'm glad that this sentence made it into the write-up: "A project like numpy requires developers to write code for advancement to occur, and obstacles that impede the writing of code discourage existing developers from contributing more, and potentially scare away developers who are thinking about joining in." I agree, which is why I'm a little surprised after reading the write-up that there's no deference to the alterNEP (admittedly kludgy) implementation? One of the arguments made for the NEP "preliminary NA-mask implementation" is that "has been extensively tested against scipy and other third-party packages, and has been in master in a stable state for a significant amount of time." It is my understanding that the manner in which this implementation found its way into master was a source of concern and contention. To me (and I don't know the level to which this is a technically feasible) that's precisely the reason that BOTH approaches be allowed to make their way into numpy with experimental status. Otherwise, it seems that there is a sort of "scaring away" of developers - seeing (from the sidelines) how much of a struggle it's been for the alterNEP to find a nurturing environment as an experimental alternative inside numpy. In my reading, the process and consensus threads that have generated so many responses stem precisely from trying to have an atmosphere where everyone is encouraged to join in. The alternatives proposed so far (though I do understand it's only for 1.7) do not suggest an appreciation for the gravity of the fallout from the neglect the alterNEP and the issues which sprang forth from that. Importantly, I find a problem with how personal this document (and discussion) is - I'd much prefer if we talk about technical things by a descriptive name, not the person who thought of it. You'll note how I've been referring to NEP and alterNEP above. One advantage of this is that down the line, if either Mark or Nathaniel change their minds about their current preferred way forward, it doesn't take the wind out of it with something like "Even Paul changed his mind and now withdraws his support of Paul's proposal." We should only focus on the technical merits of a given approach, not how many commits have been made by the person proposing them or what else they've done in their life: a good idea has value regardless of who expresses it. In my fantasy world, with both approaches clearly existing in an experimental sandbox inside numpy, folks who feel primary attachments to either NEP or alterNEP would be willing to cross party lines and pitch in towardd making progress in both camps. That's the way we'll find better solutions, by working together, instead of working in opposition. best, -- Paul Ivanov 314 address only used for lists, off-list direct email at: http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7 -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed May 9 20:46:45 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2012 18:46:45 -0600 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 6:13 PM, Paul Ivanov wrote: > > > On Wed, May 9, 2012 at 3:12 PM, Travis Oliphant wrote: > >> On re-reading, I want to make a couple of things clear: >> >> 1) This "wrap-up" discussion is *only* for what to do for NumPy 1.7 in >> such a way that we don't tie our hands in the future. I do not believe >> we can figure out what to do for masked arrays in one short week. What >> happens beyond NumPy 1.7 should be still discussed and explored. My >> urgency is entirely about moving forward from where we are in master right >> now in a direction that we can all accept. The tight timeline is so >> that we do *something* and move forward. >> >> 2) I missed another possible proposal for NumPy 1.7 which is in the >> write-up that Mark and Nathaniel made: remove the masked array additions >> entirely possibly moving them to another module like numpy-dtypes. >> >> Again, these are only for NumPy 1.7. What happens in any future NumPy >> and beyond will depend on who comes to the table for both discussion and >> code-development. >> > > I'm glad that this sentence made it into the write-up: "A project like > numpy requires developers to write code for advancement to occur, and > obstacles that impede the writing of code discourage existing developers > from contributing more, and potentially scare away developers who are > thinking about joining in." I agree, which is why I'm a little surprised > after reading the write-up that there's no deference to the alterNEP > (admittedly kludgy) implementation? One of the arguments made for the NEP > "preliminary NA-mask implementation" is that "has been extensively tested > against scipy and other third-party packages, and has been in master in a > stable state for a significant amount of time." It is my understanding that > the manner in which this implementation found its way into master was a > source of concern and contention. To me (and I don't know the level to > which this is a technically feasible) that's precisely the reason that BOTH > approaches be allowed to make their way into numpy with experimental > status. Otherwise, it seems that there is a sort of "scaring away" of > developers - seeing (from the sidelines) how much of a struggle it's been > for the alterNEP to find a nurturing environment as an experimental > alternative inside numpy. In my reading, the process and consensus threads > that have generated so many responses stem precisely from trying to have an > atmosphere where everyone is encouraged to join in. The alternatives > proposed so far (though I do understand it's only for 1.7) do not suggest > an appreciation for the gravity of the fallout from the neglect the > alterNEP and the issues which sprang forth from that. > > Importantly, I find a problem with how personal this document (and > discussion) is - I'd much prefer if we talk about technical things by a > descriptive name, not the person who thought of it. You'll note how I've > been referring to NEP and alterNEP above. One advantage of this is that > down the line, if either Mark or Nathaniel change their minds about their > current preferred way forward, it doesn't take the wind out of it with > something like "Even Paul changed his mind and now withdraws his support of > Paul's proposal." We should only focus on the technical merits of a given > approach, not how many commits have been made by the person proposing them > or what else they've done in their life: a good idea has value regardless > of who expresses it. In my fantasy world, with both approaches clearly > existing in an experimental sandbox inside numpy, folks who feel primary > attachments to either NEP or alterNEP would be willing to cross party lines > and pitch in towardd making progress in both camps. That's the way we'll > find better solutions, by working together, instead of working in > opposition. > > We are certainly open to code submissions and alternate implementations. The experimental tag would help there. But someone, as you mention, needs to write the code. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed May 9 23:05:56 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2012 21:05:56 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 12:40 PM, Sandro Tosi wrote: > On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers > wrote: > > Please test this release and report any issues on the numpy-discussion > > mailing list. > > I think it's probably nice not to ship pyc in the source tarball: > > $ find numpy-1.6.2rc1/ -name "*.pyc" > numpy-1.6.2rc1/doc/sphinxext/docscrape.pyc > numpy-1.6.2rc1/doc/sphinxext/docscrape_sphinx.pyc > numpy-1.6.2rc1/doc/sphinxext/numpydoc.pyc > numpy-1.6.2rc1/doc/sphinxext/plot_directive.pyc > > Good point ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed May 9 23:54:11 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 May 2012 05:54:11 +0200 Subject: [Numpy-discussion] Masking through generator arrays Message-ID: <4FAB3BE3.1030405@astro.uio.no> Sorry everyone for being so dense and contaminating that other thread. Here's a new thread where I can respond to Nathaniel's response. On 05/10/2012 01:08 AM, Nathaniel Smith wrote: > Hi Dag, > > On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn > wrote: >> I'm a heavy user of masks, which are used to make data NA in the >> statistical sense. The setting is that we have to mask out the radiation >> coming from the Milky Way in full-sky images of the Cosmic Microwave >> Background. There's data, but we know we can't trust it, so we make it >> NA. But we also do play around with different masks. > > Oh, this is great -- that means you're one of the users that I wasn't > sure existed or not :-). Now I know! > >> Today we keep the mask in a seperate array, and to zero-mask we do >> >> masked_data = data * mask >> >> or >> >> masked_data = data.copy() >> masked_data[mask == 0] = np.nan # soon np.NA >> >> depending on the circumstances. >> >> Honestly, API-wise, this is as good as its gets for us. Nice and >> transparent, no new semantics to learn in the special case of masks. >> >> Now, this has performance issues: Lots of memory use, extra transfers >> over the memory bus. > > Right -- this is a case where (in the NA-overview terminology) masked > storage+NA semantics would be useful. > >> BUT, NumPy has that problem all over the place, even for "x + y + z"! >> Solving it in the special case of masks, by making a new API, seems a >> bit myopic to me. >> >> IMO, that's much better solved at the fundamental level. As an >> *illustration*: >> >> with np.lazy: >> masked_data1 = data * mask1 >> masked_data2 = data * (mask1 | mask2) >> masked_data3 = (x + y + z) * (mask1& mask3) >> >> This would create three "generator arrays" that would zero-mask the >> arrays (and perform the three-term addition...) upon request. You could >> slice the generator arrays as you wish, and by that slice the data and >> the mask in one operation. Obviously this could handle NA-masking too. >> >> You can probably do this today with Theano and numexpr, and I think >> Travis mentioned that "generator arrays" are on his radar for core NumPy. > > Implementing this today would require some black magic hacks, because > on entry/exit to the context manager you'd have to "reach up" into the > calling scope and replace all the ndarray's with LazyArrays and then > vice-versa. This is actually totally possible: > https://gist.github.com/2347382 > but I'm not sure I'd call it *wise*. (You could probably avoid the > truly horrible set_globals_dict part of that gist, though.) Might be > fun to prototype, though... 1) My main point was just that I believe masked arrays is something that to me feels immature, and that it is the kind of thing that should be constructed from simpler primitives. And that NumPy should focus on simple primitives. You could make it np.gen.generating_multiply(data, mask) 2) About the with construct in particular, I intended "__enter__" and "__exit__" to only toggle a thread-local flag, and when that flag is in effect, "__mul__" would do a "generating_multiply" and return an ndarraygenerator rather than an ndarray. But of course, the amount of work is massive. > >> Point is, as a user, I'm with Travis in having masks support go hide in >> ndmasked; they solve too much of a special case in a way that is too >> particular. > > Right, that's the concern. > > Hypothetical question: are you actually saying that if you had both > bitpattern NAs and Travis' "ndmasked" object, you would still go ahead > and use the bitpattern NAs for this case, because of the conceptual > simplicity, easy of Cython/C compatibility, etc.? For sure. But that's just one data point... I'd do either a) destroying the input data by overwriting with NA, or b) pass the mask separately. However, I don't do much slicing. b) gets tiresome if you need to slice and dice your arrays, and you don't have enough memory to do a). In that case I might be tempted to use "the NEP", but I might also write my own class containing a data array and a mask array that's purposed to the task at hand... I don't know, since I don't do much slicing on the arrays I happen to mask. I've basically been wanting for this issue to die as quickly as possible, so that I could ignore it and the community move on to other issues. But now I think I've come around a position where I actually care that this doesn't make it into ndarray, in particular if the intention is to put some pressure on C extension writers to support this, rather than just saying that masked arrays don't work with most C extensions. Thanks a lot Nathaniel and Matthew and others for taking the fight. Dag From d.s.seljebotn at astro.uio.no Thu May 10 00:05:05 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 May 2012 06:05:05 +0200 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <4FAAC918.8050705@astro.uio.no> Message-ID: <4FAB3E71.8050209@astro.uio.no> On 05/10/2012 01:01 AM, Matthew Brett wrote: > Hi, > > On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn > wrote: >> On 05/09/2012 06:46 PM, Travis Oliphant wrote: >>> Hey all, >>> >>> Nathaniel and Mark have worked very hard on a joint document to try and >>> explain the current status of the missing-data debate. I think they've >>> done an amazing job at providing some context, articulating their views >>> and suggesting ways forward in a mutually respectful manner. This is an >>> exemplary collaboration and is at the core of why open source is valuable. >>> >>> The document is available here: >>> https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst >>> >>> After reading that document, it appears to me that there are some >>> fundamentally different views on how things should move forward. I'm >>> also reading the document incorporating my understanding of the history, >>> of NumPy as well as all of the users I've met and interacted with which >>> means I have my own perspective that is not necessarily incorporated >>> into that document but informs my recommendations. I'm not sure we can >>> reach full consensus on this. We are also well past time for moving >>> forward with a resolution on this (perhaps we can all agree on that). >>> >>> I would like one more discussion thread where the technical discussion >>> can take place. I will make a plea that we keep this discussion as free >>> from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as >>> we can. I can't guarantee that I personally will succeed at that, but I >>> can tell you that I will try. That's all I'm asking of anyone else. I >>> recognize that there are a lot of other issues at play here besides >>> *just* the technical questions, but we are not going to resolve every >>> community issue in this technical thread. >>> >>> We need concrete proposals and so I will start with three. Please feel >>> free to comment on these proposals or add your own during the >>> discussion. I will stop paying attention to this thread next Wednesday >>> (May 16th) (or earlier if the thread dies) and hope that by that time we >>> can agree on a way forward. If we don't have agreement, then I will move >>> forward with what I think is the right approach. I will either write the >>> code myself or convince someone else to write it. >>> >>> In all cases, we have agreement that bit-pattern dtypes should be added >>> to NumPy. We should work on these (int32, float64, complex64, str, bool) >>> to start. So, the three proposals are independent of this way forward. >>> The proposals are all about the extra mask part: >>> >>> My three proposals: >>> >>> * do nothing and leave things as is >>> >>> * add a global flag that turns off masked array support by default but >>> otherwise leaves things unchanged (I'm still unclear how this would work >>> exactly) >>> >>> * move Mark's "masked ndarray objects" into a new fundamental type >>> (ndmasked), leaving the actual ndarray type unchanged. The >>> array_interface keeps the masked array notions and the ufuncs keep the >>> ability to handle arrays like ndmasked. Ideally, numpy.ma >>> would be changed to use ndmasked objects as their core. >>> >>> For the record, I'm currently in favor of the third proposal. Feel free >>> to comment on these proposals (or provide your own). >>> >> >> Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark! > > Yes, it is very well written, my compliments to the chefs. > >> The third proposal is certainly the best one from Cython's perspective; >> and I imagine for those writing C extensions against the C API too. >> Having PyType_Check fail for ndmasked is a very good way of having code >> fail that is not written to take masks into account. I want to make something more clear: There are two Cython cases; in the case of "cdef np.ndarray[double]" there is no problem as PEP 3118 access will raise an exception for masked arrays. But, there's the case where you do "cdef np.ndarray", and then proceed to use PyArray_DATA. Myself I do this more than PEP 3118 access; usually because I pass the data pointer to some C or C++ code. It'd be great to have such code be forward-compatible in the sense that it raises an exception when it meets a masked array. Having PyType_Check fail seems like the only way? Am I wrong? > Mark, Nathaniel - can you comment how your chosen approaches would > interact with extension code? > > I'm guessing the bitpattern dtypes would be expected to cause > extension code to choke if the type is not supported? The proposal, as I understand it, is to use that with new dtypes (?). So things will often be fine for that reason: if arr.dtype == np.float32: c_function_32bit(np.PyArray_DATA(arr), ...) else: raise ValueError("need 32-bit float array") > > Mark - in : > > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython > > - do I understand correctly that you think that Cython and other > extension writers should use the numpy API to access the data rather > than accessing it directly via the data pointer and strides? That's not really fleshed out (for all the different usecases etc.); I read that as "let's discuss Cython later, when this is actively used in NumPy". Which sounds reasonable to me. Dag From charlesr.harris at gmail.com Thu May 10 00:18:44 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2012 22:18:44 -0600 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: <4FAB3BE3.1030405@astro.uio.no> References: <4FAB3BE3.1030405@astro.uio.no> Message-ID: On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > Sorry everyone for being so dense and contaminating that other thread. > Here's a new thread where I can respond to Nathaniel's response. > > On 05/10/2012 01:08 AM, Nathaniel Smith wrote: > > Hi Dag, > > > > On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn > > wrote: > >> I'm a heavy user of masks, which are used to make data NA in the > >> statistical sense. The setting is that we have to mask out the > radiation > >> coming from the Milky Way in full-sky images of the Cosmic Microwave > >> Background. There's data, but we know we can't trust it, so we make it > >> NA. But we also do play around with different masks. > > > > Oh, this is great -- that means you're one of the users that I wasn't > > sure existed or not :-). Now I know! > > > >> Today we keep the mask in a seperate array, and to zero-mask we do > >> > >> masked_data = data * mask > >> > >> or > >> > >> masked_data = data.copy() > >> masked_data[mask == 0] = np.nan # soon np.NA > >> > >> depending on the circumstances. > >> > >> Honestly, API-wise, this is as good as its gets for us. Nice and > >> transparent, no new semantics to learn in the special case of masks. > >> > >> Now, this has performance issues: Lots of memory use, extra transfers > >> over the memory bus. > > > > Right -- this is a case where (in the NA-overview terminology) masked > > storage+NA semantics would be useful. > > > >> BUT, NumPy has that problem all over the place, even for "x + y + z"! > >> Solving it in the special case of masks, by making a new API, seems a > >> bit myopic to me. > >> > >> IMO, that's much better solved at the fundamental level. As an > >> *illustration*: > >> > >> with np.lazy: > >> masked_data1 = data * mask1 > >> masked_data2 = data * (mask1 | mask2) > >> masked_data3 = (x + y + z) * (mask1& mask3) > >> > >> This would create three "generator arrays" that would zero-mask the > >> arrays (and perform the three-term addition...) upon request. You could > >> slice the generator arrays as you wish, and by that slice the data and > >> the mask in one operation. Obviously this could handle NA-masking too. > >> > >> You can probably do this today with Theano and numexpr, and I think > >> Travis mentioned that "generator arrays" are on his radar for core > NumPy. > > > > Implementing this today would require some black magic hacks, because > > on entry/exit to the context manager you'd have to "reach up" into the > > calling scope and replace all the ndarray's with LazyArrays and then > > vice-versa. This is actually totally possible: > > https://gist.github.com/2347382 > > but I'm not sure I'd call it *wise*. (You could probably avoid the > > truly horrible set_globals_dict part of that gist, though.) Might be > > fun to prototype, though... > > 1) My main point was just that I believe masked arrays is something that > to me feels immature, and that it is the kind of thing that should be > constructed from simpler primitives. And that NumPy should focus on > simple primitives. You could make it > I can't disagree, as I suggested the same as a possibility myself ;) There is a lot of infrastructure now in numpy, but given the use cases I'm tending towards the view that masked arrays should be left to others, at least for the time being. The question is how to generalize the infrastructure and what hooks to provide. I think just spending a month or two pulling stuff out is counter productive, but evolving the code is definitely needed. If you could familiarize yourself with what is in there, something that seems largely neglected by the critics, and make suggestions, that would be helpful. I'd also like to hear from Mark. It has been about 9 mos since he did the work, and I'd be surprised if he didn't have ideas for doing some things differently. OTOH, I can understand his reluctance to get involved in a topic where I thought he was poorly treated last time around. > > np.gen.generating_multiply(data, mask) > > 2) About the with construct in particular, I intended "__enter__" and > "__exit__" to only toggle a thread-local flag, and when that flag is in > effect, "__mul__" would do a "generating_multiply" and return an > ndarraygenerator rather than an ndarray. > > But of course, the amount of work is massive. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu May 10 01:05:19 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 10 May 2012 01:05:19 -0400 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wednesday, May 9, 2012, Nathaniel Smith wrote: > > > My only objection to this proposal is that committing to this approach > seems premature. The existing masked array objects act quite > differently from numpy.ma, so why do you believe that they're a good > foundation for numpy.ma, and why will users want to switch to their > semantics over numpy.ma's semantics? These aren't rhetorical > questions, it seems like they must have concrete answers, but I don't > know what they are. > Based on the design decisions made in the original NEP, a re-made numpy.mawould have to lose _some_ features particularly, the ability to share masks. Save for that and some very obscure behaviors that are undocumented, it is possible to remake numpy.ma as a compatibility layer. That being said, I think that there are some fundamental questions that has concerned. If I recall, there were unresolved questions about behaviors surrounding assignments to elements of a view. I see the project as broken down like this: 1.) internal architecture (largely abi issues) 2.) external architecture (hooks throughout numpy to utilize the new features where possible such as where= argument) 3.) getter/setter semantics 4.) mathematical semantics At this moment, I think we have pieces of 2 and they are fairly non-controversial. It is 1 that I see as being the immediate hold-up here. 3 & 4 are non-trivial, but because they are mostly about interfaces, I think we can be willing to accept some very basic, fundamental, barebones components here in order to lay the groundwork for a more complete API later. To talk of Travis's proposal, doing nothing is no-go. Not moving forward would dishearten the community. Making a ndmasked type is very intriguing. I see it as a set towards eventually deprecating ndarray? Also, how would it behave with no.asarray() and no.asanyarray()? My other concern is a possible violation of DRY. How difficult would it be to maintain two ndarrays in parallel? As for the flag approach, this still doesn't solve the problem of legacy code (or did I misunderstand?) Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu May 10 01:21:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2012 23:21:49 -0600 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On Wed, May 9, 2012 at 11:05 PM, Benjamin Root wrote: > > > On Wednesday, May 9, 2012, Nathaniel Smith wrote: > >> >> >> My only objection to this proposal is that committing to this approach >> seems premature. The existing masked array objects act quite >> differently from numpy.ma, so why do you believe that they're a good >> foundation for numpy.ma, and why will users want to switch to their >> semantics over numpy.ma's semantics? These aren't rhetorical >> questions, it seems like they must have concrete answers, but I don't >> know what they are. >> > > Based on the design decisions made in the original NEP, a re-made numpy.mawould have to lose _some_ features particularly, the ability to share > masks. Save for that and some very obscure behaviors that are undocumented, > it is possible to remake numpy.ma as a compatibility layer. > > That being said, I think that there are some fundamental questions that > has concerned. If I recall, there were unresolved questions about behaviors > surrounding assignments to elements of a view. > > I see the project as broken down like this: > 1.) internal architecture (largely abi issues) > 2.) external architecture (hooks throughout numpy to utilize the new > features where possible such as where= argument) > 3.) getter/setter semantics > 4.) mathematical semantics > > At this moment, I think we have pieces of 2 and they are fairly > non-controversial. It is 1 that I see as being the immediate hold-up here. > 3 & 4 are non-trivial, but because they are mostly about interfaces, I > think we can be willing to accept some very basic, fundamental, barebones > components here in order to lay the groundwork for a more complete API > later. > > To talk of Travis's proposal, doing nothing is no-go. Not moving forward > would dishearten the community. Making a ndmasked type is very intriguing. > I see it as a set towards eventually deprecating ndarray? Also, how would > it behave with no.asarray() and no.asanyarray()? My other concern is a > possible violation of DRY. How difficult would it be to maintain two > ndarrays in parallel? > > As for the flag approach, this still doesn't solve the problem of legacy > code (or did I misunderstand?) > My understanding of the flag is to allow the code to stay in and get reworked and experimented with while keeping it from contaminating conventional use. The whole point of putting the code in was to experiment and adjust. The rather bizarre idea that it needs to be perfect from the get go is disheartening, and is seldom how new things get developed. Sure, there is a plan up front, but there needs to be feedback and change. And in fact, I haven't seen much feedback about the actual code, I don't even know that the people complaining have tried using it to see where it hurts. I'd like that sort of feedback. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu May 10 03:10:34 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 May 2012 09:10:34 +0200 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: References: <4FAB3BE3.1030405@astro.uio.no> Message-ID: <4FAB69EA.9070306@astro.uio.no> On 05/10/2012 06:18 AM, Charles R Harris wrote: > > > On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn > > wrote: > > Sorry everyone for being so dense and contaminating that other thread. > Here's a new thread where I can respond to Nathaniel's response. > > On 05/10/2012 01:08 AM, Nathaniel Smith wrote: > > Hi Dag, > > > > On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn > > > > wrote: > >> I'm a heavy user of masks, which are used to make data NA in the > >> statistical sense. The setting is that we have to mask out the > radiation > >> coming from the Milky Way in full-sky images of the Cosmic Microwave > >> Background. There's data, but we know we can't trust it, so we > make it > >> NA. But we also do play around with different masks. > > > > Oh, this is great -- that means you're one of the users that I wasn't > > sure existed or not :-). Now I know! > > > >> Today we keep the mask in a seperate array, and to zero-mask we do > >> > >> masked_data = data * mask > >> > >> or > >> > >> masked_data = data.copy() > >> masked_data[mask == 0] = np.nan # soon np.NA > >> > >> depending on the circumstances. > >> > >> Honestly, API-wise, this is as good as its gets for us. Nice and > >> transparent, no new semantics to learn in the special case of masks. > >> > >> Now, this has performance issues: Lots of memory use, extra > transfers > >> over the memory bus. > > > > Right -- this is a case where (in the NA-overview terminology) masked > > storage+NA semantics would be useful. > > > >> BUT, NumPy has that problem all over the place, even for "x + y > + z"! > >> Solving it in the special case of masks, by making a new API, > seems a > >> bit myopic to me. > >> > >> IMO, that's much better solved at the fundamental level. As an > >> *illustration*: > >> > >> with np.lazy: > >> masked_data1 = data * mask1 > >> masked_data2 = data * (mask1 | mask2) > >> masked_data3 = (x + y + z) * (mask1& mask3) > >> > >> This would create three "generator arrays" that would zero-mask the > >> arrays (and perform the three-term addition...) upon request. > You could > >> slice the generator arrays as you wish, and by that slice the > data and > >> the mask in one operation. Obviously this could handle > NA-masking too. > >> > >> You can probably do this today with Theano and numexpr, and I think > >> Travis mentioned that "generator arrays" are on his radar for core > NumPy. > > > > Implementing this today would require some black magic hacks, because > > on entry/exit to the context manager you'd have to "reach up" > into the > > calling scope and replace all the ndarray's with LazyArrays and then > > vice-versa. This is actually totally possible: > > https://gist.github.com/2347382 > > but I'm not sure I'd call it *wise*. (You could probably avoid the > > truly horrible set_globals_dict part of that gist, though.) Might be > > fun to prototype, though... > > 1) My main point was just that I believe masked arrays is something that > to me feels immature, and that it is the kind of thing that should be > constructed from simpler primitives. And that NumPy should focus on > simple primitives. You could make it > > > I can't disagree, as I suggested the same as a possibility myself ;) > There is a lot of infrastructure now in numpy, but given the use cases > I'm tending towards the view that masked arrays should be left to > others, at least for the time being. The question is how to generalize > the infrastructure and what hooks to provide. I think just spending a > month or two pulling stuff out is counter productive, but evolving the > code is definitely needed. If you could familiarize yourself with what > is in there, something that seems largely neglected by the critics, and > make suggestions, that would be helpful. But how on earth can I make constructive criticisms about code when I don't know what the purpose of that code is supposed to be? Are you saying you agree that the masking aspect should be banned (or at least not "core"), and asking me to look at code from that perspective and comment on how to get there while keeping as much as possible of the rest? Would that really be helpful? Dag From gael.varoquaux at normalesup.org Thu May 10 04:17:56 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 10 May 2012 10:17:56 +0200 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: <20120510081756.GB23593@phare.normalesup.org> On Wed, May 09, 2012 at 02:35:26PM -0500, Travis Oliphant wrote: > ??Basically it buys not forcing *all* NumPy users (on the C-API level) to > now deal with a masked array. ? ?I know this push is a feature that is > part of Mark's intention (as it pushes downstream libraries to think about > missing data at a fundamental level). I think that this is a bad policy because: 1. An array is not always data. I realize that there is a big push for data-related computing lately, but I still believe that the notion missing data makes no sens for the majority of numpy arrays instanciated. 2. Not every algorithm can be made to work with missing data. I would even say that most of the advanced algorithm do not work with missing data. Don't try to force upon people a problem that they do not have :). Gael PS: This message does not claim to take any position in the debate on which solution for missing data is the best, because I don't think that I have a good technical vision to back any position. From scott.sinclair.za at gmail.com Thu May 10 04:40:24 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Thu, 10 May 2012 10:40:24 +0200 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On 9 May 2012 18:46, Travis Oliphant wrote: > The document is available here: > ? ?https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst This is orthogonal to the discussion, but I'm curious as to why this discussion document has landed in the website repo? I suppose it's not a really big deal, but future uploads of the website will now include a page at http://numpy.scipy.org/NA-overview.html with the content of this document. If that's desirable, I'll add a note at the top of the overview referencing this discussion thread. If not it can be relocated somewhere more desirable after this thread's discussion deadline expires.. Cheers, Scott From charlesr.harris at gmail.com Thu May 10 04:40:59 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 10 May 2012 02:40:59 -0600 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: <4FAB69EA.9070306@astro.uio.no> References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> Message-ID: On Thu, May 10, 2012 at 1:10 AM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > On 05/10/2012 06:18 AM, Charles R Harris wrote: > > > > > > On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn > > > wrote: > > > > Sorry everyone for being so dense and contaminating that other > thread. > > Here's a new thread where I can respond to Nathaniel's response. > > > > On 05/10/2012 01:08 AM, Nathaniel Smith wrote: > > > Hi Dag, > > > > > > On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn > > > > > > wrote: > > >> I'm a heavy user of masks, which are used to make data NA in the > > >> statistical sense. The setting is that we have to mask out the > > radiation > > >> coming from the Milky Way in full-sky images of the Cosmic > Microwave > > >> Background. There's data, but we know we can't trust it, so we > > make it > > >> NA. But we also do play around with different masks. > > > > > > Oh, this is great -- that means you're one of the users that I > wasn't > > > sure existed or not :-). Now I know! > > > > > >> Today we keep the mask in a seperate array, and to zero-mask we > do > > >> > > >> masked_data = data * mask > > >> > > >> or > > >> > > >> masked_data = data.copy() > > >> masked_data[mask == 0] = np.nan # soon np.NA > > >> > > >> depending on the circumstances. > > >> > > >> Honestly, API-wise, this is as good as its gets for us. Nice and > > >> transparent, no new semantics to learn in the special case of > masks. > > >> > > >> Now, this has performance issues: Lots of memory use, extra > > transfers > > >> over the memory bus. > > > > > > Right -- this is a case where (in the NA-overview terminology) > masked > > > storage+NA semantics would be useful. > > > > > >> BUT, NumPy has that problem all over the place, even for "x + y > > + z"! > > >> Solving it in the special case of masks, by making a new API, > > seems a > > >> bit myopic to me. > > >> > > >> IMO, that's much better solved at the fundamental level. As an > > >> *illustration*: > > >> > > >> with np.lazy: > > >> masked_data1 = data * mask1 > > >> masked_data2 = data * (mask1 | mask2) > > >> masked_data3 = (x + y + z) * (mask1& mask3) > > >> > > >> This would create three "generator arrays" that would zero-mask > the > > >> arrays (and perform the three-term addition...) upon request. > > You could > > >> slice the generator arrays as you wish, and by that slice the > > data and > > >> the mask in one operation. Obviously this could handle > > NA-masking too. > > >> > > >> You can probably do this today with Theano and numexpr, and I > think > > >> Travis mentioned that "generator arrays" are on his radar for > core > > NumPy. > > > > > > Implementing this today would require some black magic hacks, > because > > > on entry/exit to the context manager you'd have to "reach up" > > into the > > > calling scope and replace all the ndarray's with LazyArrays and > then > > > vice-versa. This is actually totally possible: > > > https://gist.github.com/2347382 > > > but I'm not sure I'd call it *wise*. (You could probably avoid the > > > truly horrible set_globals_dict part of that gist, though.) Might > be > > > fun to prototype, though... > > > > 1) My main point was just that I believe masked arrays is something > that > > to me feels immature, and that it is the kind of thing that should be > > constructed from simpler primitives. And that NumPy should focus on > > simple primitives. You could make it > > > > > > I can't disagree, as I suggested the same as a possibility myself ;) > > There is a lot of infrastructure now in numpy, but given the use cases > > I'm tending towards the view that masked arrays should be left to > > others, at least for the time being. The question is how to generalize > > the infrastructure and what hooks to provide. I think just spending a > > month or two pulling stuff out is counter productive, but evolving the > > code is definitely needed. If you could familiarize yourself with what > > is in there, something that seems largely neglected by the critics, and > > make suggestions, that would be helpful. > > But how on earth can I make constructive criticisms about code when I > don't know what the purpose of that code is supposed to be? > What do you mean? I thought the purpose was quite clearly laid out in the NEP. But the implementation of that purpose required some infrastructure. The point, I suppose, is for you to suggest what would serve your use case. > > Are you saying you agree that the masking aspect should be banned (or at > least not "core"), and asking me to look at code from that perspective > and comment on how to get there while keeping as much as possible of the > rest? Would that really be helpful? > No, I don't agree that it should be banned, but your perspective seems to be that it should be, so I ask you to determine what is worth keeping. We can of course pull it all out and forget about the whole thing. But I'm getting tired of people saying do this or that without making technical suggestions that can be implemented, looking at the code, testing things, and providing feedback. At a minimum, I expect you to have an idea of how things *should* work and how to get there. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu May 10 05:07:41 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 May 2012 11:07:41 +0200 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: <4FAB3E71.8050209@astro.uio.no> References: <4FAAC918.8050705@astro.uio.no> <4FAB3E71.8050209@astro.uio.no> Message-ID: <4FAB855D.8000103@astro.uio.no> On 05/10/2012 06:05 AM, Dag Sverre Seljebotn wrote: > On 05/10/2012 01:01 AM, Matthew Brett wrote: >> Hi, >> >> On Wed, May 9, 2012 at 12:44 PM, Dag Sverre Seljebotn >> wrote: >>> On 05/09/2012 06:46 PM, Travis Oliphant wrote: >>>> Hey all, >>>> >>>> Nathaniel and Mark have worked very hard on a joint document to try and >>>> explain the current status of the missing-data debate. I think they've >>>> done an amazing job at providing some context, articulating their views >>>> and suggesting ways forward in a mutually respectful manner. This is an >>>> exemplary collaboration and is at the core of why open source is valuable. >>>> >>>> The document is available here: >>>> https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst >>>> >>>> After reading that document, it appears to me that there are some >>>> fundamentally different views on how things should move forward. I'm >>>> also reading the document incorporating my understanding of the history, >>>> of NumPy as well as all of the users I've met and interacted with which >>>> means I have my own perspective that is not necessarily incorporated >>>> into that document but informs my recommendations. I'm not sure we can >>>> reach full consensus on this. We are also well past time for moving >>>> forward with a resolution on this (perhaps we can all agree on that). >>>> >>>> I would like one more discussion thread where the technical discussion >>>> can take place. I will make a plea that we keep this discussion as free >>>> from logical fallacies http://en.wikipedia.org/wiki/Logical_fallacy as >>>> we can. I can't guarantee that I personally will succeed at that, but I >>>> can tell you that I will try. That's all I'm asking of anyone else. I >>>> recognize that there are a lot of other issues at play here besides >>>> *just* the technical questions, but we are not going to resolve every >>>> community issue in this technical thread. >>>> >>>> We need concrete proposals and so I will start with three. Please feel >>>> free to comment on these proposals or add your own during the >>>> discussion. I will stop paying attention to this thread next Wednesday >>>> (May 16th) (or earlier if the thread dies) and hope that by that time we >>>> can agree on a way forward. If we don't have agreement, then I will move >>>> forward with what I think is the right approach. I will either write the >>>> code myself or convince someone else to write it. >>>> >>>> In all cases, we have agreement that bit-pattern dtypes should be added >>>> to NumPy. We should work on these (int32, float64, complex64, str, bool) >>>> to start. So, the three proposals are independent of this way forward. >>>> The proposals are all about the extra mask part: >>>> >>>> My three proposals: >>>> >>>> * do nothing and leave things as is >>>> >>>> * add a global flag that turns off masked array support by default but >>>> otherwise leaves things unchanged (I'm still unclear how this would work >>>> exactly) >>>> >>>> * move Mark's "masked ndarray objects" into a new fundamental type >>>> (ndmasked), leaving the actual ndarray type unchanged. The >>>> array_interface keeps the masked array notions and the ufuncs keep the >>>> ability to handle arrays like ndmasked. Ideally, numpy.ma >>>> would be changed to use ndmasked objects as their core. >>>> >>>> For the record, I'm currently in favor of the third proposal. Feel free >>>> to comment on these proposals (or provide your own). >>>> >>> >>> Bravo!, NA-overview.rst was an excellent read. Thanks Nathaniel and Mark! >> >> Yes, it is very well written, my compliments to the chefs. >> >>> The third proposal is certainly the best one from Cython's perspective; >>> and I imagine for those writing C extensions against the C API too. >>> Having PyType_Check fail for ndmasked is a very good way of having code >>> fail that is not written to take masks into account. > > I want to make something more clear: There are two Cython cases; in the > case of "cdef np.ndarray[double]" there is no problem as PEP 3118 access > will raise an exception for masked arrays. > > But, there's the case where you do "cdef np.ndarray", and then proceed > to use PyArray_DATA. Myself I do this more than PEP 3118 access; usually > because I pass the data pointer to some C or C++ code. > > It'd be great to have such code be forward-compatible in the sense that > it raises an exception when it meets a masked array. Having PyType_Check > fail seems like the only way? Am I wrong? I'm very sorry; I always meant PyObject_TypeCheck, not PyType_Check. Dag > > >> Mark, Nathaniel - can you comment how your chosen approaches would >> interact with extension code? >> >> I'm guessing the bitpattern dtypes would be expected to cause >> extension code to choke if the type is not supported? > > The proposal, as I understand it, is to use that with new dtypes (?). So > things will often be fine for that reason: > > if arr.dtype == np.float32: > c_function_32bit(np.PyArray_DATA(arr), ...) > else: > raise ValueError("need 32-bit float array") > > >> >> Mark - in : >> >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#cython >> >> - do I understand correctly that you think that Cython and other >> extension writers should use the numpy API to access the data rather >> than accessing it directly via the data pointer and strides? > > That's not really fleshed out (for all the different usecases etc.); I > read that as "let's discuss Cython later, when this is actively used in > NumPy". Which sounds reasonable to me. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Thu May 10 05:38:19 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 May 2012 11:38:19 +0200 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> Message-ID: <4FAB8C8B.9090107@astro.uio.no> On 05/10/2012 10:40 AM, Charles R Harris wrote: > > > On Thu, May 10, 2012 at 1:10 AM, Dag Sverre Seljebotn > > wrote: > > On 05/10/2012 06:18 AM, Charles R Harris wrote: > > > > > > On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn > > > >> wrote: > > > > Sorry everyone for being so dense and contaminating that > other thread. > > Here's a new thread where I can respond to Nathaniel's response. > > > > On 05/10/2012 01:08 AM, Nathaniel Smith wrote: > > > Hi Dag, > > > > > > On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn > > > > >> > > wrote: > > >> I'm a heavy user of masks, which are used to make data NA in the > > >> statistical sense. The setting is that we have to mask out the > > radiation > > >> coming from the Milky Way in full-sky images of the Cosmic > Microwave > > >> Background. There's data, but we know we can't trust it, so we > > make it > > >> NA. But we also do play around with different masks. > > > > > > Oh, this is great -- that means you're one of the users that I > wasn't > > > sure existed or not :-). Now I know! > > > > > >> Today we keep the mask in a seperate array, and to zero-mask we do > > >> > > >> masked_data = data * mask > > >> > > >> or > > >> > > >> masked_data = data.copy() > > >> masked_data[mask == 0] = np.nan # soon np.NA > > >> > > >> depending on the circumstances. > > >> > > >> Honestly, API-wise, this is as good as its gets for us. Nice and > > >> transparent, no new semantics to learn in the special case of > masks. > > >> > > >> Now, this has performance issues: Lots of memory use, extra > > transfers > > >> over the memory bus. > > > > > > Right -- this is a case where (in the NA-overview terminology) > masked > > > storage+NA semantics would be useful. > > > > > >> BUT, NumPy has that problem all over the place, even for "x + y > > + z"! > > >> Solving it in the special case of masks, by making a new API, > > seems a > > >> bit myopic to me. > > >> > > >> IMO, that's much better solved at the fundamental level. As an > > >> *illustration*: > > >> > > >> with np.lazy: > > >> masked_data1 = data * mask1 > > >> masked_data2 = data * (mask1 | mask2) > > >> masked_data3 = (x + y + z) * (mask1& mask3) > > >> > > >> This would create three "generator arrays" that would > zero-mask the > > >> arrays (and perform the three-term addition...) upon request. > > You could > > >> slice the generator arrays as you wish, and by that slice the > > data and > > >> the mask in one operation. Obviously this could handle > > NA-masking too. > > >> > > >> You can probably do this today with Theano and numexpr, and I > think > > >> Travis mentioned that "generator arrays" are on his radar for core > > NumPy. > > > > > > Implementing this today would require some black magic hacks, > because > > > on entry/exit to the context manager you'd have to "reach up" > > into the > > > calling scope and replace all the ndarray's with LazyArrays and > then > > > vice-versa. This is actually totally possible: > > > https://gist.github.com/2347382 > > > but I'm not sure I'd call it *wise*. (You could probably avoid the > > > truly horrible set_globals_dict part of that gist, though.) > Might be > > > fun to prototype, though... > > > > 1) My main point was just that I believe masked arrays is > something that > > to me feels immature, and that it is the kind of thing that > should be > > constructed from simpler primitives. And that NumPy should > focus on > > simple primitives. You could make it > > > > > > I can't disagree, as I suggested the same as a possibility myself ;) > > There is a lot of infrastructure now in numpy, but given the use > cases > > I'm tending towards the view that masked arrays should be left to > > others, at least for the time being. The question is how to > generalize > > the infrastructure and what hooks to provide. I think just spending a > > month or two pulling stuff out is counter productive, but > evolving the > > code is definitely needed. If you could familiarize yourself with > what > > is in there, something that seems largely neglected by the > critics, and > > make suggestions, that would be helpful. > > But how on earth can I make constructive criticisms about code when I > don't know what the purpose of that code is supposed to be? > > > What do you mean? I thought the purpose was quite clearly laid out in > the NEP. But the implementation of that purpose required some > infrastructure. The point, I suppose, is for you to suggest what would > serve your use case. What would serve me? I use NumPy as a glorified "double*". And I'm probably going to continue doing so. I'd be most happy by the status quo of NumPy 1.6 + bugfixes. Main reason I'm getting involved is that the direction NumPy is going has consequences for Cython and Cython users and the Cython code I write myself. So my main comment is that I'd like (by whatever means necessary) to have PyObject_TypeCheck fail whenever you can't sanely access PyArray_DATA or in other ways use the good old NumPy C API (for whatever reason, such as masks). If PyObject_TypeCheck fails, existing C extension modules are forward-compatible, and so one doesn't have to go over old extension module code to add in a check for the presence of masks (or whatever) and raise an exception. Suggestion: Use the PyArrayObject struct for both ndarray and ndmasked, so that ndarray has NULL mask fields. So ob_type makes them appear different to Python and C extensions, but internally in NumPy you could just check for a NULL mask field rather than the type, if you prefer that. (When I say I like status quo: There are things I'd *like* to see changed in NumPy, but it's better phrased in terms of full reimplementation rather than iterative refinement, which probably means I should keep those thoughts to myself until I grow up. But until NumPy changes on a pretty fundamental level, all I want is my glorified "double*". I'm probably not a representative user.) > Are you saying you agree that the masking aspect should be banned (or at > least not "core"), and asking me to look at code from that perspective > and comment on how to get there while keeping as much as possible of the > rest? Would that really be helpful? > > > No, I don't agree that it should be banned, but your perspective seems > to be that it should be, so I ask you to determine what is worth > keeping. We can of course pull it all out and forget about the whole > thing. But I'm getting tired of people saying do this or that without > making technical suggestions that can be implemented, looking at the > code, testing things, and providing feedback. At a minimum, I expect you > to have an idea of how things *should* work and how to get there. Look, I'm used to coding; at least I used to spend a significant amount of time on Cython, doing some pretty extensive refactorings of the code base that touched pretty much all the code. And on the Cython list, we almost always keep details of implementation second-order to the semantics we want. We do keep an eye on "how difficult it would be to implement" while discussing semantics, and sometimes have to say "we ideally want this long-term, what's the stopgap solution we can use until we can make it work the way we want". But it's nothing like what you suggest above. And we don't compromise short-term if better semantics are available long-term (e.g., recent thread on None-checking). But talking about process is getting old, I guess. (Of course there's plenty of other factors involved that could explain why the Cython/SymPy/Sage/etc. development communities are so good, and the NumPy one so...ehrmm..) Dag From njs at pobox.com Thu May 10 05:43:07 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 10 May 2012 10:43:07 +0100 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <4FAAC918.8050705@astro.uio.no> Message-ID: Hi Matthew, On Thu, May 10, 2012 at 12:01 AM, Matthew Brett wrote: >> The third proposal is certainly the best one from Cython's perspective; >> and I imagine for those writing C extensions against the C API too. >> Having PyType_Check fail for ndmasked is a very good way of having code >> fail that is not written to take masks into account. > > Mark, Nathaniel - can you comment how your chosen approaches would > interact with extension code? > > I'm guessing the bitpattern dtypes would be expected to cause > extension code to choke if the type is not supported? That's pretty much how I'm imagining it, yes. Right now if you have, say, a Cython function like cdef f(np.ndarray[double] a): ... and you do f(np.zeros(10, dtype=int)), then it will error out, because that function doesn't know how to handle ints, only doubles. The same would apply for, say, a NA-enabled integer. In general there are almost arbitrarily many dtypes that could get passed into any function (including user-defined ones, etc.), so C code already has to check dtypes for correctness. Second order issues: - There is certainly C code out there that just assumes that it will only be passed an array with certain dtype (and ndim, memory layout, etc...). If you write such C code then it's your job to make sure that you only pass it the kinds of arrays that it expects, just like now :-). - We may want to do some sort of special-casing of handling for floating point NA dtypes that use an NaN as the "magic" bitpattern, since many algorithms *will* work with these unchanged, and it might be frustrating to have to wait for every extension module to be updated just to allow for this case explicitly before using them. OTOH you can easily work around this. Like say my_qr is a legacy C function that will in fact propagate NaNs correctly, so float NA dtypes would Just Work -- except, it errors out at the start because it doesn't recognize the dtype. How annoying. We *could* have some special hack you can use to force it to work anyway (by like making the "is this the dtype I expect?" routine lie.) But you can also just do: def my_qr_wrapper(arr): if arr.dtype is a NA float dtype with NaN magic value: result = my_qr(arr.view(arr.dtype.base_dtype)) return result.view(arr.dtype) else: return my_qr(arr) and hey presto, now it will correctly pass through NAs. So perhaps it's not worth bothering with special hacks. - Of course if your extension function does want to handle NAs generically, then there will be a simple C api for checking for them, setting them, etc. Numpy needs such an API internally anyway! -- Nathaniel From d.s.seljebotn at astro.uio.no Thu May 10 05:46:21 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 May 2012 11:46:21 +0200 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: <4FAB8C8B.9090107@astro.uio.no> References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> Message-ID: <4FAB8E6D.1020001@astro.uio.no> On 05/10/2012 11:38 AM, Dag Sverre Seljebotn wrote: > On 05/10/2012 10:40 AM, Charles R Harris wrote: >> >> >> On Thu, May 10, 2012 at 1:10 AM, Dag Sverre Seljebotn >> > wrote: >> >> On 05/10/2012 06:18 AM, Charles R Harris wrote: >> > >> > >> > On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn >> > >> > >> wrote: >> > >> > Sorry everyone for being so dense and contaminating that >> other thread. >> > Here's a new thread where I can respond to Nathaniel's response. >> > >> > On 05/10/2012 01:08 AM, Nathaniel Smith wrote: >> > > Hi Dag, >> > > >> > > On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn >> > > >> >> >> > wrote: >> > >> I'm a heavy user of masks, which are used to make data NA in the >> > >> statistical sense. The setting is that we have to mask out the >> > radiation >> > >> coming from the Milky Way in full-sky images of the Cosmic >> Microwave >> > >> Background. There's data, but we know we can't trust it, so we >> > make it >> > >> NA. But we also do play around with different masks. >> > > >> > > Oh, this is great -- that means you're one of the users that I >> wasn't >> > > sure existed or not :-). Now I know! >> > > >> > >> Today we keep the mask in a seperate array, and to zero-mask we do >> > >> >> > >> masked_data = data * mask >> > >> >> > >> or >> > >> >> > >> masked_data = data.copy() >> > >> masked_data[mask == 0] = np.nan # soon np.NA >> > >> >> > >> depending on the circumstances. >> > >> >> > >> Honestly, API-wise, this is as good as its gets for us. Nice and >> > >> transparent, no new semantics to learn in the special case of >> masks. >> > >> >> > >> Now, this has performance issues: Lots of memory use, extra >> > transfers >> > >> over the memory bus. >> > > >> > > Right -- this is a case where (in the NA-overview terminology) >> masked >> > > storage+NA semantics would be useful. >> > > >> > >> BUT, NumPy has that problem all over the place, even for "x + y >> > + z"! >> > >> Solving it in the special case of masks, by making a new API, >> > seems a >> > >> bit myopic to me. >> > >> >> > >> IMO, that's much better solved at the fundamental level. As an >> > >> *illustration*: >> > >> >> > >> with np.lazy: >> > >> masked_data1 = data * mask1 >> > >> masked_data2 = data * (mask1 | mask2) >> > >> masked_data3 = (x + y + z) * (mask1& mask3) >> > >> >> > >> This would create three "generator arrays" that would >> zero-mask the >> > >> arrays (and perform the three-term addition...) upon request. >> > You could >> > >> slice the generator arrays as you wish, and by that slice the >> > data and >> > >> the mask in one operation. Obviously this could handle >> > NA-masking too. >> > >> >> > >> You can probably do this today with Theano and numexpr, and I >> think >> > >> Travis mentioned that "generator arrays" are on his radar for core >> > NumPy. >> > > >> > > Implementing this today would require some black magic hacks, >> because >> > > on entry/exit to the context manager you'd have to "reach up" >> > into the >> > > calling scope and replace all the ndarray's with LazyArrays and >> then >> > > vice-versa. This is actually totally possible: >> > > https://gist.github.com/2347382 >> > > but I'm not sure I'd call it *wise*. (You could probably avoid the >> > > truly horrible set_globals_dict part of that gist, though.) >> Might be >> > > fun to prototype, though... >> > >> > 1) My main point was just that I believe masked arrays is >> something that >> > to me feels immature, and that it is the kind of thing that >> should be >> > constructed from simpler primitives. And that NumPy should >> focus on >> > simple primitives. You could make it >> > >> > >> > I can't disagree, as I suggested the same as a possibility myself ;) >> > There is a lot of infrastructure now in numpy, but given the use >> cases >> > I'm tending towards the view that masked arrays should be left to >> > others, at least for the time being. The question is how to >> generalize >> > the infrastructure and what hooks to provide. I think just spending a >> > month or two pulling stuff out is counter productive, but >> evolving the >> > code is definitely needed. If you could familiarize yourself with >> what >> > is in there, something that seems largely neglected by the >> critics, and >> > make suggestions, that would be helpful. >> >> But how on earth can I make constructive criticisms about code when I >> don't know what the purpose of that code is supposed to be? >> >> >> What do you mean? I thought the purpose was quite clearly laid out in >> the NEP. But the implementation of that purpose required some >> infrastructure. The point, I suppose, is for you to suggest what would >> serve your use case. > > What would serve me? I use NumPy as a glorified "double*". And I'm > probably going to continue doing so. I'd be most happy by the status quo > of NumPy 1.6 + bugfixes. > > Main reason I'm getting involved is that the direction NumPy is going > has consequences for Cython and Cython users and the Cython code I write > myself. > > So my main comment is that I'd like (by whatever means necessary) to > have PyObject_TypeCheck fail whenever you can't sanely access > PyArray_DATA or in other ways use the good old NumPy C API (for whatever > reason, such as masks). > > If PyObject_TypeCheck fails, existing C extension modules are > forward-compatible, and so one doesn't have to go over old extension > module code to add in a check for the presence of masks (or whatever) > and raise an exception. > > Suggestion: Use the PyArrayObject struct for both ndarray and ndmasked, > so that ndarray has NULL mask fields. So ob_type makes them appear > different to Python and C extensions, but internally in NumPy you could > just check for a NULL mask field rather than the type, if you prefer that. > > (When I say I like status quo: There are things I'd *like* to see > changed in NumPy, but it's better phrased in terms of full > reimplementation rather than iterative refinement, which probably means > I should keep those thoughts to myself until I grow up. But until NumPy > changes on a pretty fundamental level, all I want is my glorified > "double*". I'm probably not a representative user.) > >> Are you saying you agree that the masking aspect should be banned (or at >> least not "core"), and asking me to look at code from that perspective >> and comment on how to get there while keeping as much as possible of the >> rest? Would that really be helpful? >> >> >> No, I don't agree that it should be banned, but your perspective seems >> to be that it should be, so I ask you to determine what is worth >> keeping. We can of course pull it all out and forget about the whole >> thing. But I'm getting tired of people saying do this or that without >> making technical suggestions that can be implemented, looking at the >> code, testing things, and providing feedback. At a minimum, I expect you >> to have an idea of how things *should* work and how to get there. > > Look, I'm used to coding; at least I used to spend a significant amount > of time on Cython, doing some pretty extensive refactorings of the code > base that touched pretty much all the code. > > And on the Cython list, we almost always keep details of implementation > second-order to the semantics we want. We do keep an eye on "how > difficult it would be to implement" while discussing semantics, and > sometimes have to say "we ideally want this long-term, what's the > stopgap solution we can use until we can make it work the way we want". > But it's nothing like what you suggest above. And we don't compromise > short-term if better semantics are available long-term (e.g., recent > thread on None-checking). > > But talking about process is getting old, I guess. > > (Of course there's plenty of other factors involved that could explain > why the Cython/SymPy/Sage/etc. development communities are so good, and > the NumPy one so...ehrmm..) I'm very sorry. I should have said "lists"! I don't really know anything about the NumPy developer community. Anyway, I sense my best course of action will be to shut up. Dag From d.s.seljebotn at astro.uio.no Thu May 10 10:48:21 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 10 May 2012 16:48:21 +0200 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: <4FAB8C8B.9090107@astro.uio.no> References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> Message-ID: <4FABD535.5060004@astro.uio.no> On 05/10/2012 11:38 AM, Dag Sverre Seljebotn wrote: > On 05/10/2012 10:40 AM, Charles R Harris wrote: >> >> >> On Thu, May 10, 2012 at 1:10 AM, Dag Sverre Seljebotn >> > wrote: >> >> On 05/10/2012 06:18 AM, Charles R Harris wrote: >> > >> > >> > On Wed, May 9, 2012 at 9:54 PM, Dag Sverre Seljebotn >> > >> > >> wrote: >> > >> > Sorry everyone for being so dense and contaminating that >> other thread. >> > Here's a new thread where I can respond to Nathaniel's response. >> > >> > On 05/10/2012 01:08 AM, Nathaniel Smith wrote: >> > > Hi Dag, >> > > >> > > On Wed, May 9, 2012 at 8:44 PM, Dag Sverre Seljebotn >> > > >> >> >> > wrote: >> > >> I'm a heavy user of masks, which are used to make data NA in the >> > >> statistical sense. The setting is that we have to mask out the >> > radiation >> > >> coming from the Milky Way in full-sky images of the Cosmic >> Microwave >> > >> Background. There's data, but we know we can't trust it, so we >> > make it >> > >> NA. But we also do play around with different masks. >> > > >> > > Oh, this is great -- that means you're one of the users that I >> wasn't >> > > sure existed or not :-). Now I know! >> > > >> > >> Today we keep the mask in a seperate array, and to zero-mask we do >> > >> >> > >> masked_data = data * mask >> > >> >> > >> or >> > >> >> > >> masked_data = data.copy() >> > >> masked_data[mask == 0] = np.nan # soon np.NA >> > >> >> > >> depending on the circumstances. >> > >> >> > >> Honestly, API-wise, this is as good as its gets for us. Nice and >> > >> transparent, no new semantics to learn in the special case of >> masks. >> > >> >> > >> Now, this has performance issues: Lots of memory use, extra >> > transfers >> > >> over the memory bus. >> > > >> > > Right -- this is a case where (in the NA-overview terminology) >> masked >> > > storage+NA semantics would be useful. >> > > >> > >> BUT, NumPy has that problem all over the place, even for "x + y >> > + z"! >> > >> Solving it in the special case of masks, by making a new API, >> > seems a >> > >> bit myopic to me. >> > >> >> > >> IMO, that's much better solved at the fundamental level. As an >> > >> *illustration*: >> > >> >> > >> with np.lazy: >> > >> masked_data1 = data * mask1 >> > >> masked_data2 = data * (mask1 | mask2) >> > >> masked_data3 = (x + y + z) * (mask1& mask3) >> > >> >> > >> This would create three "generator arrays" that would >> zero-mask the >> > >> arrays (and perform the three-term addition...) upon request. >> > You could >> > >> slice the generator arrays as you wish, and by that slice the >> > data and >> > >> the mask in one operation. Obviously this could handle >> > NA-masking too. >> > >> >> > >> You can probably do this today with Theano and numexpr, and I >> think >> > >> Travis mentioned that "generator arrays" are on his radar for core >> > NumPy. >> > > >> > > Implementing this today would require some black magic hacks, >> because >> > > on entry/exit to the context manager you'd have to "reach up" >> > into the >> > > calling scope and replace all the ndarray's with LazyArrays and >> then >> > > vice-versa. This is actually totally possible: >> > > https://gist.github.com/2347382 >> > > but I'm not sure I'd call it *wise*. (You could probably avoid the >> > > truly horrible set_globals_dict part of that gist, though.) >> Might be >> > > fun to prototype, though... >> > >> > 1) My main point was just that I believe masked arrays is >> something that >> > to me feels immature, and that it is the kind of thing that >> should be >> > constructed from simpler primitives. And that NumPy should >> focus on >> > simple primitives. You could make it >> > >> > >> > I can't disagree, as I suggested the same as a possibility myself ;) >> > There is a lot of infrastructure now in numpy, but given the use >> cases >> > I'm tending towards the view that masked arrays should be left to >> > others, at least for the time being. The question is how to >> generalize >> > the infrastructure and what hooks to provide. I think just spending a >> > month or two pulling stuff out is counter productive, but >> evolving the >> > code is definitely needed. If you could familiarize yourself with >> what >> > is in there, something that seems largely neglected by the >> critics, and >> > make suggestions, that would be helpful. >> >> But how on earth can I make constructive criticisms about code when I >> don't know what the purpose of that code is supposed to be? >> >> >> What do you mean? I thought the purpose was quite clearly laid out in >> the NEP. But the implementation of that purpose required some >> infrastructure. The point, I suppose, is for you to suggest what would >> serve your use case. > > What would serve me? I use NumPy as a glorified "double*". And I'm > probably going to continue doing so. I'd be most happy by the status quo > of NumPy 1.6 + bugfixes. > > Main reason I'm getting involved is that the direction NumPy is going > has consequences for Cython and Cython users and the Cython code I write > myself. > > So my main comment is that I'd like (by whatever means necessary) to > have PyObject_TypeCheck fail whenever you can't sanely access > PyArray_DATA or in other ways use the good old NumPy C API (for whatever > reason, such as masks). > > If PyObject_TypeCheck fails, existing C extension modules are > forward-compatible, and so one doesn't have to go over old extension > module code to add in a check for the presence of masks (or whatever) > and raise an exception. > > Suggestion: Use the PyArrayObject struct for both ndarray and ndmasked, > so that ndarray has NULL mask fields. So ob_type makes them appear > different to Python and C extensions, but internally in NumPy you could > just check for a NULL mask field rather than the type, if you prefer that. > > (When I say I like status quo: There are things I'd *like* to see > changed in NumPy, but it's better phrased in terms of full > reimplementation rather than iterative refinement, which probably means > I should keep those thoughts to myself until I grow up. But until NumPy > changes on a pretty fundamental level, all I want is my glorified > "double*". I'm probably not a representative user.) > >> Are you saying you agree that the masking aspect should be banned (or at >> least not "core"), and asking me to look at code from that perspective >> and comment on how to get there while keeping as much as possible of the >> rest? Would that really be helpful? >> >> >> No, I don't agree that it should be banned, but your perspective seems >> to be that it should be, so I ask you to determine what is worth >> keeping. We can of course pull it all out and forget about the whole >> thing. But I'm getting tired of people saying do this or that without >> making technical suggestions that can be implemented, looking at the >> code, testing things, and providing feedback. At a minimum, I expect you >> to have an idea of how things *should* work and how to get there. Chuck -- I apologize for my behaviour. I got aggravated by your reply, but that doesn't excuse the form of my answer. Let me try to answer again. Listen: If you can explain to me *how* I would look at the source code -- what purpose I should have in mind while I do it -- then I'll do it. Usually when I look at source code it's because I need to fix bug X, or implement feature Y. I have a clear purpose for opening up my editor. But it's unclear to me how me simply looking at source code would help inform the discussion we're having here. If you can explain that to me (please spoon-feed me), I'm happy. I could go through just to see how to make it how I'd personally prefer it, without regard to the opinion of others. But with overwhelming probability, what would come out of that is just what I've already said ("rip masks out"), just stated in a more verbose form (the steps needed to rip them out). I don't see how that would help anybody. (Is it just that you disbelieve my future-looking statement here?) There are other usecases than mine. But reading the source code doesn't inform me about those usecases. So either somebody else than me must comment on the source code, or I must gain a much better understanding of the motivation for masks than what I have today. If you still want me to do what you say, just give me a better understanding of how I go about it, and that it'll be a valuable thing for NumPy that I do it, and I'll set aside some hours. Dag From chaoyuejoy at gmail.com Thu May 10 12:21:33 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 10 May 2012 18:21:33 +0200 Subject: [Numpy-discussion] read line mixed with string and number? Message-ID: Dear all, I have files which contain lines like this: 30516F5 Sep 1985 1-Day Rain Trace 0.2 3.2 Trace 0.0 0.0 0.0 0.0 0.2 0.0 Trace 29.2 0.0 0.0 0.0 0.0 1.8 30516F5 Sep 1985 1-Day Snow Trace 0.0 0.0 0.0 14.8 10.1 Trace 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Trace Trace 0.0 30516F5 Sep 1985 1-Day Pcpn. Trace 0.2 3.2 Trace 18.9 9.8 Trace 0.0 0.2 0.0 Trace 29.2 0.0 0.0 0.0 Trace 1.8 0.0 30516F5 May 1986 Max. Temp. Misg Misg Misg Misg Misg Misg 9.0 8.0 8.0 0.0 6.0 1.0 1.0 -3.0 3. 30516F5 May 1986 Min. Temp. Misg Misg Misg Misg Misg Misg Misg -1.0 -2.0 -6.0 -5.0 -5.0 -3.0 -7.0 -6.0 -5.0 -3.0 different columns were separated by blank spaces. with the first column as sitename, second as month name, then year, then variable name and data. I want to read them line by line into a list, and then connect all the numerical data within one year into a list, and then combining different year data into one masked ndarray, in this process, I check the flags (Trace, Misg, etc.) and replace them as unique values (or missing values). and then begin to analyse the data. each file contains only one site, it can be big or small depending on the number of years. I don't know what's the good way to do this job. what I am thinking is to read one file line by line, and then divide this line by blank space, and replace special flag. but during this process, I need to do type conversion. any suggestion would be appreciated. Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu May 10 14:23:27 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 10 May 2012 11:23:27 -0700 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: <4FAB8C8B.9090107@astro.uio.no> References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> Message-ID: On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn wrote: > What would serve me? I use NumPy as a glorified "double*". > all I want is my glorified > "double*". I'm probably not a representative user.) Actually, I think you are representative of a LOT of users -- it turns, out, whether Jim Huginin originally was thinking this way or not, but numpy arrays are really powerful because the provide BOTH and nifty, full featured array object in Python, AND a wrapper around a generic "double*" (actually char*, that could be any type). This is are really widely used feature, and has become even more so with Cython's numpy support. That is one of my concerns about the "bit pattern" idea -- we've then created a new binary type that no other standard software understands -- that looks like a a lot of work to me to deal with, or even worse, ripe for weird, non-obvious errors in code that access that good-old char*. So I'm happier with a mask implementation -- more memory, yes, but it seems more robust an easy to deal with with outside code. But either way, Dag's key point is right on -- in Cython (or any other code) -- we need to make sure ti's easy to get a regular old pointer to a regular old C array, and get something else by accident. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From sransom at nrao.edu Thu May 10 14:52:03 2012 From: sransom at nrao.edu (Scott Ransom) Date: Thu, 10 May 2012 14:52:03 -0400 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> Message-ID: <4FAC0E53.5050108@nrao.edu> On 05/10/2012 02:23 PM, Chris Barker wrote: > On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn > wrote: >> What would serve me? I use NumPy as a glorified "double*". > >> all I want is my glorified >> "double*". I'm probably not a representative user.) > > Actually, I think you are representative of a LOT of users -- it > turns, out, whether Jim Huginin originally was thinking this way or > not, but numpy arrays are really powerful because the provide BOTH and > nifty, full featured array object in Python, AND a wrapper around a > generic "double*" (actually char*, that could be any type). > > This is are really widely used feature, and has become even more so > with Cython's numpy support. > > That is one of my concerns about the "bit pattern" idea -- we've then > created a new binary type that no other standard software understands > -- that looks like a a lot of work to me to deal with, or even worse, > ripe for weird, non-obvious errors in code that access that good-old > char*. > > So I'm happier with a mask implementation -- more memory, yes, but it > seems more robust an easy to deal with with outside code. > > But either way, Dag's key point is right on -- in Cython (or any other > code) -- we need to make sure ti's easy to get a regular old pointer > to a regular old C array, and get something else by accident. > > -Chris Agreed. (As someone who has been heavily using Numpy since the early days of numeric, and who wrote and maintains a suite of scientific software that uses Numpy and its C-API in exactly this way.) Note that I wasn't aware that the proposed mask implementation might (or would?) change this behavior... (and hopefully I haven't just misinterpreted these last few emails. If so, I apologize.). Cheers, Scott -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From souheil.inati at nih.gov Thu May 10 14:55:41 2012 From: souheil.inati at nih.gov (Inati, Souheil (NIH/NIMH) [E]) Date: Thu, 10 May 2012 14:55:41 -0400 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> Message-ID: <0F5F343D-5A24-45ED-B673-D1B4775F13C3@nih.gov> On May 10, 2012, at 2:23 PM, Chris Barker wrote: > On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn > wrote: >> What would serve me? I use NumPy as a glorified "double*". > >> all I want is my glorified >> "double*". I'm probably not a representative user.) > > Actually, I think you are representative of a LOT of users -- it > turns, out, whether Jim Huginin originally was thinking this way or > not, but numpy arrays are really powerful because the provide BOTH and > nifty, full featured array object in Python, AND a wrapper around a > generic "double*" (actually char*, that could be any type). > > This is are really widely used feature, and has become even more so > with Cython's numpy support. > > That is one of my concerns about the "bit pattern" idea -- we've then > created a new binary type that no other standard software understands > -- that looks like a a lot of work to me to deal with, or even worse, > ripe for weird, non-obvious errors in code that access that good-old > char*. > > So I'm happier with a mask implementation -- more memory, yes, but it > seems more robust an easy to deal with with outside code. > > But either way, Dag's key point is right on -- in Cython (or any other > code) -- we need to make sure ti's easy to get a regular old pointer > to a regular old C array, and get something else by accident. > > -Chris > > +1 As a physicist who uses numpy to develop MRI image reconstruction and data analysis methods, I really do think of numpy as a glorified double with a nice way to call useful numerical methods. I also use external methods all the time and it's of the utmost importance to have a pointer to a block of data that I can say is N complex doubles or something. Using a separate array for a mask is not a big deal. At worst it's a factor of 2 in memory. It forces me to pay attention to what I'm doing, and if I want to do an SVD on my data, I better keep track of what I'm doing myself. I am not that old, but I'm old enough to remember when matlab was really just this - glorified double with a nice slicing/view interface and a thin wrapper around eispack and linpack. (here is a great article by Cleve Moler from 2000: http://www.mathworks.com/company/newsletters/news_notes/clevescorner/winter2000.cleve.html). You used to read in some ints from a data file and they converted it to double and you knew that if you got numerical precision errors it was because your algorithm was wrong or you were inverting some nearly singular matrix or something, not because of overflow. And they made a copy of the data every time you called a function. It had serious limitations, but what it did just worked. And then they started to get fancy and it took them a REALLY long time and a lot of versions and man hours to get that all sorted out, with lazy evaluations and classes and sparse arrays and all that. I'm not saying what the developers of numpy should do about the masked array thing and I really can't comment on how other people use numpy. I also don't really have much of a say about the technical implementations of the guts of numpy, but it's worth asking really simple questions like: I want to do an SVD on a 2D array with some missing or masked data. What should happen? This seems like such a simple question, but really it is incredibly complicated, or rather, it's very hard for numpy which is a foundation framework type of code to guess what the user means. Anyway, that's my point of view. I'm really happy numpy exists and works as well as it does and I'm thankful that there are developers out there that can build something so useful. Cheers, Souheil ---------- Souheil Inati, PhD Staff Scientist Functional MRI Facility NIMH/NIH > > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Thu May 10 15:14:46 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 10 May 2012 13:14:46 -0600 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: <4FAC0E53.5050108@nrao.edu> References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> <4FAC0E53.5050108@nrao.edu> Message-ID: On Thu, May 10, 2012 at 12:52 PM, Scott Ransom wrote: > On 05/10/2012 02:23 PM, Chris Barker wrote: > > On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn > > wrote: > >> What would serve me? I use NumPy as a glorified "double*". > > > >> all I want is my glorified > >> "double*". I'm probably not a representative user.) > > > > Actually, I think you are representative of a LOT of users -- it > > turns, out, whether Jim Huginin originally was thinking this way or > > not, but numpy arrays are really powerful because the provide BOTH and > > nifty, full featured array object in Python, AND a wrapper around a > > generic "double*" (actually char*, that could be any type). > > > > This is are really widely used feature, and has become even more so > > with Cython's numpy support. > > > > That is one of my concerns about the "bit pattern" idea -- we've then > > created a new binary type that no other standard software understands > > -- that looks like a a lot of work to me to deal with, or even worse, > > ripe for weird, non-obvious errors in code that access that good-old > > char*. > > > > So I'm happier with a mask implementation -- more memory, yes, but it > > seems more robust an easy to deal with with outside code. > > > > But either way, Dag's key point is right on -- in Cython (or any other > > code) -- we need to make sure ti's easy to get a regular old pointer > > to a regular old C array, and get something else by accident. > > > > -Chris > > Agreed. (As someone who has been heavily using Numpy since the early > days of numeric, and who wrote and maintains a suite of scientific > software that uses Numpy and its C-API in exactly this way.) > > Note that I wasn't aware that the proposed mask implementation might (or > would?) change this behavior... (and hopefully I haven't just > misinterpreted these last few emails. If so, I apologize.). > > I haven't seen a change in this behavior, otherwise most of current numpy would break. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu May 10 15:17:56 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 10 May 2012 13:17:56 -0600 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> <4FAC0E53.5050108@nrao.edu> Message-ID: On Thu, May 10, 2012 at 1:14 PM, Charles R Harris wrote: > > > On Thu, May 10, 2012 at 12:52 PM, Scott Ransom wrote: > >> On 05/10/2012 02:23 PM, Chris Barker wrote: >> > On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn >> > wrote: >> >> What would serve me? I use NumPy as a glorified "double*". >> > >> >> all I want is my glorified >> >> "double*". I'm probably not a representative user.) >> > >> > Actually, I think you are representative of a LOT of users -- it >> > turns, out, whether Jim Huginin originally was thinking this way or >> > not, but numpy arrays are really powerful because the provide BOTH and >> > nifty, full featured array object in Python, AND a wrapper around a >> > generic "double*" (actually char*, that could be any type). >> > >> > This is are really widely used feature, and has become even more so >> > with Cython's numpy support. >> > >> > That is one of my concerns about the "bit pattern" idea -- we've then >> > created a new binary type that no other standard software understands >> > -- that looks like a a lot of work to me to deal with, or even worse, >> > ripe for weird, non-obvious errors in code that access that good-old >> > char*. >> > >> > So I'm happier with a mask implementation -- more memory, yes, but it >> > seems more robust an easy to deal with with outside code. >> > >> > But either way, Dag's key point is right on -- in Cython (or any other >> > code) -- we need to make sure ti's easy to get a regular old pointer >> > to a regular old C array, and get something else by accident. >> > >> > -Chris >> >> Agreed. (As someone who has been heavily using Numpy since the early >> days of numeric, and who wrote and maintains a suite of scientific >> software that uses Numpy and its C-API in exactly this way.) >> >> Note that I wasn't aware that the proposed mask implementation might (or >> would?) change this behavior... (and hopefully I haven't just >> misinterpreted these last few emails. If so, I apologize.). >> >> > I haven't seen a change in this behavior, otherwise most of current numpy > would break. > > I suspect this rumour comes from some ideas for generator arrays (not mine), but I would strongly oppose anything that changes things that much. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu May 10 15:29:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 10 May 2012 14:29:53 -0500 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> Message-ID: On May 10, 2012, at 1:23 PM, Chris Barker wrote: > On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn > wrote: >> What would serve me? I use NumPy as a glorified "double*". > >> all I want is my glorified >> "double*". I'm probably not a representative user.) > > Actually, I think you are representative of a LOT of users -- it > turns, out, whether Jim Huginin originally was thinking this way or > not, but numpy arrays are really powerful because the provide BOTH and > nifty, full featured array object in Python, AND a wrapper around a > generic "double*" (actually char*, that could be any type). > > This is are really widely used feature, and has become even more so > with Cython's numpy support. > > That is one of my concerns about the "bit pattern" idea -- we've then > created a new binary type that no other standard software understands > -- that looks like a a lot of work to me to deal with, or even worse, > ripe for weird, non-obvious errors in code that access that good-old > char*. This needs to be clarified, the point of the "bit pattern" idea is that the downstream user would have to actually *request* data in that format or they would get an error. You would not get it by "accident". If you asked for an array of floats you would get an array of floats (not an array of NA-floats). R has *already* created this binary type and we are just including the ability to understand it in NumPy. This is why it is an easy thing to do without changing the structure of what a NumPy array *is*. Adding the concept of a mask to *every* NumPy array (even NumPy arrays that are currently being used in the wild to represent masks) is the big change that I don't think should happen. -Travis From ben.root at ou.edu Thu May 10 15:46:04 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 10 May 2012 15:46:04 -0400 Subject: [Numpy-discussion] spurious space in printing record arrays? Message-ID: Just noticed this in the output from printing some numpy record arrays: [[('2008081712', -24, -78.0, 20.100000381469727, 45.0, -999.0, 0.0)] [ ('2008081718', -18, -79.5999984741211, 20.700000762939453, 45.0, -999.0, 0.0)] [ ('2008081800', -12, -80.30000305175781, 21.100000381469727, 45.0, -999.0, 0.0)] [ ('2008081806', -6, -80.80000305175781, 21.899999618530273, 45.0, -999.0, 0.0)] [ ('2008081812', 0, -81.19999694824219, 23.200000762939453, 50.0, -999.0, 1002.0)]] [[ ('2008081812', 0, -81.19999694824219, 23.200000762939453, 50.0, -999.0, 0.0)] [('2008081815', 3, -81.5, 23.600000381469727, 50.0, -999.0, 1003.0)] [ ('2008081900', 12, -81.80000305175781, 24.600000381469727, 55.0, -999.0, 0.0)] [ ('2008081912', 24, -82.0999984741211, 26.200000762939453, 65.0, -999.0, 0.0)] [('2008082000', 36, -82.0, 27.799999237060547, 50.0, -999.0, 0.0)] [ ('2008082012', 48, -81.80000305175781, 29.299999237060547, 40.0, -999.0, 0.0)] [('2008082112', 72, -81.5, 31.5, 35.0, -999.0, 0.0)] [('2008082212', 96, -81.5, 33.599998474121094, 25.0, -999.0, 0.0)] [('2008082312', 120, -82.5, 35.5, 20.0, -999.0, 0.0)]] On my 80-character wide terminal window, each line that gets wrapped also has an extra space after the inner square bracket. Coincidence? Using v1.6.1 I don't think it is a big problem... just odd. Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Thu May 10 17:08:45 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 10 May 2012 17:08:45 -0400 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: It is just to tell that all test pass here on Fedora 14 and all Theano pass with this rc. thanks Fred On Wed, May 9, 2012 at 11:05 PM, Charles R Harris wrote: > > > On Wed, May 9, 2012 at 12:40 PM, Sandro Tosi wrote: >> >> On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers >> wrote: >> > Please test this release and report any issues on the numpy-discussion >> > mailing list. >> >> I think it's probably nice not to ship pyc in the source tarball: >> >> $ find numpy-1.6.2rc1/ -name "*.pyc" >> numpy-1.6.2rc1/doc/sphinxext/docscrape.pyc >> numpy-1.6.2rc1/doc/sphinxext/docscrape_sphinx.pyc >> numpy-1.6.2rc1/doc/sphinxext/numpydoc.pyc >> numpy-1.6.2rc1/doc/sphinxext/plot_directive.pyc >> > > Good point ;) > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Thu May 10 18:27:13 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 00:27:13 +0200 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> Message-ID: <4FAC40C1.5010803@astro.uio.no> On 05/10/2012 08:23 PM, Chris Barker wrote: > On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn > wrote: >> What would serve me? I use NumPy as a glorified "double*". > >> all I want is my glorified >> "double*". I'm probably not a representative user.) > > Actually, I think you are representative of a LOT of users -- it > turns, out, whether Jim Huginin originally was thinking this way or > not, but numpy arrays are really powerful because the provide BOTH and > nifty, full featured array object in Python, AND a wrapper around a > generic "double*" (actually char*, that could be any type). > > This is are really widely used feature, and has become even more so > with Cython's numpy support. > > That is one of my concerns about the "bit pattern" idea -- we've then > created a new binary type that no other standard software understands > -- that looks like a a lot of work to me to deal with, or even worse, > ripe for weird, non-obvious errors in code that access that good-old > char*. > > So I'm happier with a mask implementation -- more memory, yes, but it > seems more robust an easy to deal with with outside code. It's very interesting that you consider masks easier to integrate with C/C++ code than bitpatterns. I guess everybody's experience (and every C/C++/Fortran code base) is different. > > But either way, Dag's key point is right on -- in Cython (or any other > code) -- we need to make sure ti's easy to get a regular old pointer > to a regular old C array, and get something else by accident. I'm sorry if I caused any confusion -- I didn't mean to suggest that anybody would ever remove the ability of getting a pointer to an unmasked array. There is a problem that's being discussed of the opposite nature: With masked arrays, the current situation in NumPy trunk is that if you're presented with a masked array, and do not explicitly check for a mask (i.e., all existing code), you'll transparently and without warning "unmask" it -- that is, an element has the last value before NA was assigned. This is the case whether you use PEP 3118 (np.ndarray[double] or double[:]), or PyArray_DATA. According to the NEP, you should really get an exception when accessing through PEP 3118, but this seems to not be implemented. I don't know whether this was a conscious change or a lack of implementation (?). PyArray_DATA will continue to transparently unmask data. However, with Travis' proposal of making a new 'ndmasked' type, old code will be protected; it will raise an exception for masked arrays instead of transparently unmasking, giving the user a chance to work around it (or update the code to work with masks). Regarding new code that you write to be mask-aware, fear not -- you can use PyArray_DATA and PyArray_MASKNA_DATA to get the pointers. You can't really access the mask using np.ndarray[uint8] or uint8[:], but it wouldn't be a problem for NumPy to provide such access for Cython users. Regarding native Cython support for masks, bitpatterns would be a quick job and an uncontroversial feature, we just need to agree on an extension to the PEP 3118 format string with NumPy and then it takes a few hours to implement it. Masks would require quite some hashing out on the Cython email list to figure out whether and how we would want to support it, and is quite some more development work as well. How we'd even do that is much more vague to me. Dag From mwwiebe at gmail.com Thu May 10 18:28:35 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 10 May 2012 17:28:35 -0500 Subject: [Numpy-discussion] NA-mask interactions with existing C code Message-ID: I did some searching for typical Cython and C code which accesses numpy arrays, and added a section to the NEP describing how they behave in the current implementation. Cython code which uses either straight Python access or the buffer protocol is fine (after a bugfix in numpy, it wasn't failing currently as it should in the pep3118 case). C code which follows the recommended practice of using PyArray_FromAny or one of the related macros is also fine, because these functions have been made to fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is provided. In general, code which follows the recommended numpy practices will raise exceptions when encountering NA-masked arrays. This means programmers don't have to worry about the NA unless they want to support it. Having things go through PyArray_FromAny also provides a place where lazy evaluation arrays could be evaluated, and other similar potential future extensions can use to provide compatibility. Here's the section I added to the NEP: Interaction With Pre-existing C API Usage ========================================= Making sure existing code using the C API, whether it's written in C, C++, or Cython, does something reasonable is an important goal of this implementation. The general strategy is to make existing code which does not explicitly tell numpy it supports NA masks fail with an exception saying so. There are a few different access patterns people use to get ahold of the numpy array data, here we examine a few of them to see what numpy can do. These examples are found from doing google searches of numpy C API array access. Numpy Documentation - How to extend NumPy ----------------------------------------- http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects This page has a section "Dealing with array objects" which has some advice for how to access numpy arrays from C. When accepting arrays, the first step it suggests is to use PyArray_FromAny or a macro built on that function, so code following this advice will properly fail when given an NA-masked array it doesn't know how to handle. The way this is handled is that PyArray_FromAny requires a special flag, NPY_ARRAY_ALLOWNA, before it will allow NA-masked arrays to flow through. http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA Code which does not follow this advice, and instead just calls PyArray_Check() to verify its an ndarray and checks some flags, will silently produce incorrect results. This style of code does not provide any opportunity for numpy to say "hey, this array is special", so also is not compatible with future ideas of lazy evaluation, derived dtypes, etc. Tutorial From Cython Website ---------------------------- http://docs.cython.org/src/tutorial/numpy.html This tutorial gives a convolution example, and all the examples fail with Python exceptions when given inputs that contain NA values. Before any Cython type annotation is introduced, the code functions just as equivalent Python would in the interpreter. When the type information is introduced, it is done via numpy.pxd which defines a mapping between an ndarray declaration and PyArrayObject \*. Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray. Then the code does some dtype comparisons, and uses regular python indexing to access the array elements. This python indexing still goes through the Python API, so the NA handling and error checking in numpy still can work like normal and fail if the inputs have NAs which cannot fit in the output array. In this case it fails when trying to convert the NA into an integer to set in in the output. The next version of the code introduces more efficient indexing. This operates based on Python's buffer protocol. This causes Cython to call __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls PyObject_GetBuffer. This call gives numpy the opportunity to raise an exception if the inputs are arrays with NA-masks, something not supported by the Python buffer protocol. Numerical Python - JPL website ------------------------------ http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html This document is from 2001, so does not reflect recent numpy, but it is the second hit when searching for "numpy c api example" on google. There first example, heading "A simple example", is in fact already invalid for recent numpy even without the NA support. In particular, if the data is misaligned or in a different byteorder, it may crash or produce incorrect results. The next thing the document does is introduce PyArray_ContiguousFromObject, which gives numpy an opportunity to raise an exception when NA-masked arrays are used, so the later code will raise exceptions as desired. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu May 10 18:34:02 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 10 May 2012 17:34:02 -0500 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: <4FAC40C1.5010803@astro.uio.no> References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> <4FAC40C1.5010803@astro.uio.no> Message-ID: On Thu, May 10, 2012 at 5:27 PM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > On 05/10/2012 08:23 PM, Chris Barker wrote: > > On Thu, May 10, 2012 at 2:38 AM, Dag Sverre Seljebotn > > wrote: > >> What would serve me? I use NumPy as a glorified "double*". > > > >> all I want is my glorified > >> "double*". I'm probably not a representative user.) > > > > Actually, I think you are representative of a LOT of users -- it > > turns, out, whether Jim Huginin originally was thinking this way or > > not, but numpy arrays are really powerful because the provide BOTH and > > nifty, full featured array object in Python, AND a wrapper around a > > generic "double*" (actually char*, that could be any type). > > > > This is are really widely used feature, and has become even more so > > with Cython's numpy support. > > > > That is one of my concerns about the "bit pattern" idea -- we've then > > created a new binary type that no other standard software understands > > -- that looks like a a lot of work to me to deal with, or even worse, > > ripe for weird, non-obvious errors in code that access that good-old > > char*. > > > > So I'm happier with a mask implementation -- more memory, yes, but it > > seems more robust an easy to deal with with outside code. > > It's very interesting that you consider masks easier to integrate with > C/C++ code than bitpatterns. I guess everybody's experience (and every > C/C++/Fortran code base) is different. > > > > > But either way, Dag's key point is right on -- in Cython (or any other > > code) -- we need to make sure ti's easy to get a regular old pointer > > to a regular old C array, and get something else by accident. > > I'm sorry if I caused any confusion -- I didn't mean to suggest that > anybody would ever remove the ability of getting a pointer to an > unmasked array. > > There is a problem that's being discussed of the opposite nature: > > With masked arrays, the current situation in NumPy trunk is that if > you're presented with a masked array, and do not explicitly check for a > mask (i.e., all existing code), you'll transparently and without warning > "unmask" it -- that is, an element has the last value before NA was > assigned. This is the case whether you use PEP 3118 (np.ndarray[double] > or double[:]), or PyArray_DATA. > > According to the NEP, you should really get an exception when accessing > through PEP 3118, but this seems to not be implemented. I don't know > whether this was a conscious change or a lack of implementation (?). > This was an error, I've made a pull request to fix it. > PyArray_DATA will continue to transparently unmask data. However, with > Travis' proposal of making a new 'ndmasked' type, old code will be > protected; it will raise an exception for masked arrays instead of > transparently unmasking, giving the user a chance to work around it (or > update the code to work with masks). > In searching for example code, the examples I found and the numpy documentation recommend using the PyArray_FromAny or related functions to sanitize the array before use. This provides a place to stop NA-masked arrays and raise an exception. Is there a lot of code out there which isn't following this practice? Cheers, Mark > Regarding new code that you write to be mask-aware, fear not -- you can > use PyArray_DATA and PyArray_MASKNA_DATA to get the pointers. You can't > really access the mask using np.ndarray[uint8] or uint8[:], but it > wouldn't be a problem for NumPy to provide such access for Cython users. > > Regarding native Cython support for masks, bitpatterns would be a quick > job and an uncontroversial feature, we just need to agree on an > extension to the PEP 3118 format string with NumPy and then it takes a > few hours to implement it. Masks would require quite some hashing out on > the Cython email list to figure out whether and how we would want to > support it, and is quite some more development work as well. How we'd > even do that is much more vague to me. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu May 10 18:47:27 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 00:47:27 +0200 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: References: Message-ID: <4FAC457F.3090405@astro.uio.no> On 05/11/2012 12:28 AM, Mark Wiebe wrote: > I did some searching for typical Cython and C code which accesses numpy > arrays, and added a section to the NEP describing how they behave in the > current implementation. Cython code which uses either straight Python > access or the buffer protocol is fine (after a bugfix in numpy, it > wasn't failing currently as it should in the pep3118 case). C code which > follows the recommended practice of using PyArray_FromAny or one of the > related macros is also fine, because these functions have been made to > fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is provided. > > In general, code which follows the recommended numpy practices will > raise exceptions when encountering NA-masked arrays. This means > programmers don't have to worry about the NA unless they want to support > it. Having things go through PyArray_FromAny also provides a place where > lazy evaluation arrays could be evaluated, and other similar potential > future extensions can use to provide compatibility. > > Here's the section I added to the NEP: > > Interaction With Pre-existing C API Usage > ========================================= > > Making sure existing code using the C API, whether it's written in C, C++, > or Cython, does something reasonable is an important goal of this > implementation. > The general strategy is to make existing code which does not explicitly > tell numpy it supports NA masks fail with an exception saying so. There are > a few different access patterns people use to get ahold of the numpy > array data, > here we examine a few of them to see what numpy can do. These examples are > found from doing google searches of numpy C API array access. > > Numpy Documentation - How to extend NumPy > ----------------------------------------- > > http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects > > This page has a section "Dealing with array objects" which has some > advice for how > to access numpy arrays from C. When accepting arrays, the first step it > suggests is > to use PyArray_FromAny or a macro built on that function, so code > following this > advice will properly fail when given an NA-masked array it doesn't know > how to handle. > > The way this is handled is that PyArray_FromAny requires a special flag, > NPY_ARRAY_ALLOWNA, > before it will allow NA-masked arrays to flow through. > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA > > Code which does not follow this advice, and instead just calls > PyArray_Check() to verify > its an ndarray and checks some flags, will silently produce incorrect > results. This style > of code does not provide any opportunity for numpy to say "hey, this > array is special", > so also is not compatible with future ideas of lazy evaluation, derived > dtypes, etc. This doesn't really cover the Cython code I write that interfaces with C (and probably the code others write in Cython). Often I'd do: def f(arg): cdef np.ndarray arr = np.asarray(arg) c_func(np.PyArray_DATA(arr)) So I mix Python np.asarray with C PyArray_DATA. In general, I think you use PyArray_FromAny if you're very concerned about performance or need some special flag, but it's certainly not the first thing you tgry. But in general, I will often be lazy and just do def f(np.ndarray arr): c_func(np.PyArray_DATA(arr)) It's an exception if you don't provide an array -- so who cares. (I guess the odds of somebody feeding a masked array to code like that, which doesn't try to be friendly, is relatively smaller though.) If you know the datatype, you can really do def f(np.ndarray[double] arr): c_func(&arr[0]) which works with PEP 3118. But I use PyArray_DATA out of habit (and since it works in the cases without dtype). Frankly, I don't expect any Cython code to do the right thing here; calling PyArray_FromAny is much more typing. And really, nobody ever questioned that if we had an actual ndarray instance, we'd be allowed to call PyArray_DATA. I don't know how much Cython code is out there in the wild for which this is a problem. Either way, it would cause something of a reeducation challenge for Cython users. Dag > > Tutorial From Cython Website > ---------------------------- > > http://docs.cython.org/src/tutorial/numpy.html > > This tutorial gives a convolution example, and all the examples fail with > Python exceptions when given inputs that contain NA values. > > Before any Cython type annotation is introduced, the code functions just > as equivalent Python would in the interpreter. > > When the type information is introduced, it is done via numpy.pxd which > defines a mapping between an ndarray declaration and PyArrayObject \*. > Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct > comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray. > > Then the code does some dtype comparisons, and uses regular python indexing > to access the array elements. This python indexing still goes through the > Python API, so the NA handling and error checking in numpy still can work > like normal and fail if the inputs have NAs which cannot fit in the output > array. In this case it fails when trying to convert the NA into an integer > to set in in the output. > > The next version of the code introduces more efficient indexing. This > operates based on Python's buffer protocol. This causes Cython to call > __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls > PyObject_GetBuffer. This call gives numpy the opportunity to raise an > exception if the inputs are arrays with NA-masks, something not supported > by the Python buffer protocol. > > Numerical Python - JPL website > ------------------------------ > > http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html > > This document is from 2001, so does not reflect recent numpy, but it is the > second hit when searching for "numpy c api example" on google. > > There first example, heading "A simple example", is in fact already > invalid for > recent numpy even without the NA support. In particular, if the data is > misaligned > or in a different byteorder, it may crash or produce incorrect results. > > The next thing the document does is introduce > PyArray_ContiguousFromObject, which > gives numpy an opportunity to raise an exception when NA-masked arrays > are used, > so the later code will raise exceptions as desired. > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Thu May 10 19:02:37 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 01:02:37 +0200 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: <4FAC457F.3090405@astro.uio.no> References: <4FAC457F.3090405@astro.uio.no> Message-ID: <4FAC490D.2090002@astro.uio.no> On 05/11/2012 12:47 AM, Dag Sverre Seljebotn wrote: > On 05/11/2012 12:28 AM, Mark Wiebe wrote: >> I did some searching for typical Cython and C code which accesses numpy >> arrays, and added a section to the NEP describing how they behave in the >> current implementation. Cython code which uses either straight Python >> access or the buffer protocol is fine (after a bugfix in numpy, it >> wasn't failing currently as it should in the pep3118 case). C code which >> follows the recommended practice of using PyArray_FromAny or one of the >> related macros is also fine, because these functions have been made to >> fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is provided. >> >> In general, code which follows the recommended numpy practices will >> raise exceptions when encountering NA-masked arrays. This means >> programmers don't have to worry about the NA unless they want to support >> it. Having things go through PyArray_FromAny also provides a place where >> lazy evaluation arrays could be evaluated, and other similar potential >> future extensions can use to provide compatibility. >> >> Here's the section I added to the NEP: >> >> Interaction With Pre-existing C API Usage >> ========================================= >> >> Making sure existing code using the C API, whether it's written in C, C++, >> or Cython, does something reasonable is an important goal of this >> implementation. >> The general strategy is to make existing code which does not explicitly >> tell numpy it supports NA masks fail with an exception saying so. There are >> a few different access patterns people use to get ahold of the numpy >> array data, >> here we examine a few of them to see what numpy can do. These examples are >> found from doing google searches of numpy C API array access. >> >> Numpy Documentation - How to extend NumPy >> ----------------------------------------- >> >> http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects >> >> This page has a section "Dealing with array objects" which has some >> advice for how >> to access numpy arrays from C. When accepting arrays, the first step it >> suggests is >> to use PyArray_FromAny or a macro built on that function, so code >> following this >> advice will properly fail when given an NA-masked array it doesn't know >> how to handle. >> >> The way this is handled is that PyArray_FromAny requires a special flag, >> NPY_ARRAY_ALLOWNA, >> before it will allow NA-masked arrays to flow through. >> >> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA >> >> Code which does not follow this advice, and instead just calls >> PyArray_Check() to verify >> its an ndarray and checks some flags, will silently produce incorrect >> results. This style >> of code does not provide any opportunity for numpy to say "hey, this >> array is special", >> so also is not compatible with future ideas of lazy evaluation, derived >> dtypes, etc. > > This doesn't really cover the Cython code I write that interfaces with C > (and probably the code others write in Cython). > > Often I'd do: > > def f(arg): > cdef np.ndarray arr = np.asarray(arg) > c_func(np.PyArray_DATA(arr)) > > So I mix Python np.asarray with C PyArray_DATA. In general, I think you > use PyArray_FromAny if you're very concerned about performance or need > some special flag, but it's certainly not the first thing you tgry. > > But in general, I will often be lazy and just do > > def f(np.ndarray arr): > c_func(np.PyArray_DATA(arr)) > > It's an exception if you don't provide an array -- so who cares. (I > guess the odds of somebody feeding a masked array to code like that, > which doesn't try to be friendly, is relatively smaller though.) > > If you know the datatype, you can really do > > def f(np.ndarray[double] arr): > c_func(&arr[0]) > > which works with PEP 3118. But I use PyArray_DATA out of habit (and > since it works in the cases without dtype). > > Frankly, I don't expect any Cython code to do the right thing here; > calling PyArray_FromAny is much more typing. And really, nobody ever > questioned that if we had an actual ndarray instance, we'd be allowed to > call PyArray_DATA. > > I don't know how much Cython code is out there in the wild for which > this is a problem. Either way, it would cause something of a reeducation > challenge for Cython users. Also note that Cython users are in the habit of accessing "arr.data" (which is the char*, not the buffer object) directly. Just in case you had the idea of grepping for PyArray_DATA in Cython code. Our plan there is we'll eventually put out a Cython version which special-cases np.ndarray and turn ".data" into a call to PyArray_DATA (and same for shape, strides, ...). Ugly hack, but avoids breaking existing Cython code if NumPy removes the field access. Dag > > Dag > >> >> Tutorial From Cython Website >> ---------------------------- >> >> http://docs.cython.org/src/tutorial/numpy.html >> >> This tutorial gives a convolution example, and all the examples fail with >> Python exceptions when given inputs that contain NA values. >> >> Before any Cython type annotation is introduced, the code functions just >> as equivalent Python would in the interpreter. >> >> When the type information is introduced, it is done via numpy.pxd which >> defines a mapping between an ndarray declaration and PyArrayObject \*. >> Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct >> comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray. >> >> Then the code does some dtype comparisons, and uses regular python indexing >> to access the array elements. This python indexing still goes through the >> Python API, so the NA handling and error checking in numpy still can work >> like normal and fail if the inputs have NAs which cannot fit in the output >> array. In this case it fails when trying to convert the NA into an integer >> to set in in the output. >> >> The next version of the code introduces more efficient indexing. This >> operates based on Python's buffer protocol. This causes Cython to call >> __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls >> PyObject_GetBuffer. This call gives numpy the opportunity to raise an >> exception if the inputs are arrays with NA-masks, something not supported >> by the Python buffer protocol. >> >> Numerical Python - JPL website >> ------------------------------ >> >> http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html >> >> This document is from 2001, so does not reflect recent numpy, but it is the >> second hit when searching for "numpy c api example" on google. >> >> There first example, heading "A simple example", is in fact already >> invalid for >> recent numpy even without the NA support. In particular, if the data is >> misaligned >> or in a different byteorder, it may crash or produce incorrect results. >> >> The next thing the document does is introduce >> PyArray_ContiguousFromObject, which >> gives numpy an opportunity to raise an exception when NA-masked arrays >> are used, >> so the later code will raise exceptions as desired. >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mwwiebe at gmail.com Thu May 10 19:06:15 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 10 May 2012 18:06:15 -0500 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: <4FAC457F.3090405@astro.uio.no> References: <4FAC457F.3090405@astro.uio.no> Message-ID: On Thu, May 10, 2012 at 5:47 PM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > On 05/11/2012 12:28 AM, Mark Wiebe wrote: > > I did some searching for typical Cython and C code which accesses numpy > > arrays, and added a section to the NEP describing how they behave in the > > current implementation. Cython code which uses either straight Python > > access or the buffer protocol is fine (after a bugfix in numpy, it > > wasn't failing currently as it should in the pep3118 case). C code which > > follows the recommended practice of using PyArray_FromAny or one of the > > related macros is also fine, because these functions have been made to > > fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is provided. > > > > In general, code which follows the recommended numpy practices will > > raise exceptions when encountering NA-masked arrays. This means > > programmers don't have to worry about the NA unless they want to support > > it. Having things go through PyArray_FromAny also provides a place where > > lazy evaluation arrays could be evaluated, and other similar potential > > future extensions can use to provide compatibility. > > > > Here's the section I added to the NEP: > > > > Interaction With Pre-existing C API Usage > > ========================================= > > > > Making sure existing code using the C API, whether it's written in C, > C++, > > or Cython, does something reasonable is an important goal of this > > implementation. > > The general strategy is to make existing code which does not explicitly > > tell numpy it supports NA masks fail with an exception saying so. There > are > > a few different access patterns people use to get ahold of the numpy > > array data, > > here we examine a few of them to see what numpy can do. These examples > are > > found from doing google searches of numpy C API array access. > > > > Numpy Documentation - How to extend NumPy > > ----------------------------------------- > > > > > http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects > > > > This page has a section "Dealing with array objects" which has some > > advice for how > > to access numpy arrays from C. When accepting arrays, the first step it > > suggests is > > to use PyArray_FromAny or a macro built on that function, so code > > following this > > advice will properly fail when given an NA-masked array it doesn't know > > how to handle. > > > > The way this is handled is that PyArray_FromAny requires a special flag, > > NPY_ARRAY_ALLOWNA, > > before it will allow NA-masked arrays to flow through. > > > > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA > > > > Code which does not follow this advice, and instead just calls > > PyArray_Check() to verify > > its an ndarray and checks some flags, will silently produce incorrect > > results. This style > > of code does not provide any opportunity for numpy to say "hey, this > > array is special", > > so also is not compatible with future ideas of lazy evaluation, derived > > dtypes, etc. > > This doesn't really cover the Cython code I write that interfaces with C > (and probably the code others write in Cython). > > Often I'd do: > > def f(arg): > cdef np.ndarray arr = np.asarray(arg) > c_func(np.PyArray_DATA(arr)) > > So I mix Python np.asarray with C PyArray_DATA. In general, I think you > use PyArray_FromAny if you're very concerned about performance or need > some special flag, but it's certainly not the first thing you tgry. > I guess this mixture of Python-API and C-API is different from the way the API tries to protect incorrect access. From the Python API, it should let everything through, because it's for Python code to use. From the C API, it should default to not letting things through, because special NA-mask aware code needs to be written. I'm not sure if there is a reasonable approach here which works for everything. > But in general, I will often be lazy and just do > > def f(np.ndarray arr): > c_func(np.PyArray_DATA(arr)) > > It's an exception if you don't provide an array -- so who cares. (I > guess the odds of somebody feeding a masked array to code like that, > which doesn't try to be friendly, is relatively smaller though.) > This code would already fail with non-contiguous strides or byte-swapped data, so the additional NA mask case seems to fit in an already-failing category. > > If you know the datatype, you can really do > > def f(np.ndarray[double] arr): > c_func(&arr[0]) > > which works with PEP 3118. But I use PyArray_DATA out of habit (and > since it works in the cases without dtype). > > Frankly, I don't expect any Cython code to do the right thing here; > calling PyArray_FromAny is much more typing. And really, nobody ever > questioned that if we had an actual ndarray instance, we'd be allowed to > call PyArray_DATA. > > I don't know how much Cython code is out there in the wild for which > this is a problem. Either way, it would cause something of a reeducation > challenge for Cython users. > Since this style of coding already has known problems, do you think the case with NA-masks deserves more attention here? What will happen is access to array element data without consideration of the mask, which seems similar in nature to accessing array data with the wrong stride or byte order. Cheers, Mark > Dag > > > > > Tutorial From Cython Website > > ---------------------------- > > > > http://docs.cython.org/src/tutorial/numpy.html > > > > This tutorial gives a convolution example, and all the examples fail with > > Python exceptions when given inputs that contain NA values. > > > > Before any Cython type annotation is introduced, the code functions just > > as equivalent Python would in the interpreter. > > > > When the type information is introduced, it is done via numpy.pxd which > > defines a mapping between an ndarray declaration and PyArrayObject \*. > > Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct > > comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray. > > > > Then the code does some dtype comparisons, and uses regular python > indexing > > to access the array elements. This python indexing still goes through the > > Python API, so the NA handling and error checking in numpy still can work > > like normal and fail if the inputs have NAs which cannot fit in the > output > > array. In this case it fails when trying to convert the NA into an > integer > > to set in in the output. > > > > The next version of the code introduces more efficient indexing. This > > operates based on Python's buffer protocol. This causes Cython to call > > __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls > > PyObject_GetBuffer. This call gives numpy the opportunity to raise an > > exception if the inputs are arrays with NA-masks, something not supported > > by the Python buffer protocol. > > > > Numerical Python - JPL website > > ------------------------------ > > > > http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html > > > > This document is from 2001, so does not reflect recent numpy, but it is > the > > second hit when searching for "numpy c api example" on google. > > > > There first example, heading "A simple example", is in fact already > > invalid for > > recent numpy even without the NA support. In particular, if the data is > > misaligned > > or in a different byteorder, it may crash or produce incorrect results. > > > > The next thing the document does is introduce > > PyArray_ContiguousFromObject, which > > gives numpy an opportunity to raise an exception when NA-masked arrays > > are used, > > so the later code will raise exceptions as desired. > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu May 10 20:15:18 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 02:15:18 +0200 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: References: <4FAC457F.3090405@astro.uio.no> Message-ID: <4FAC5A16.6090407@astro.uio.no> On 05/11/2012 01:06 AM, Mark Wiebe wrote: > On Thu, May 10, 2012 at 5:47 PM, Dag Sverre Seljebotn > > wrote: > > On 05/11/2012 12:28 AM, Mark Wiebe wrote: > > I did some searching for typical Cython and C code which accesses > numpy > > arrays, and added a section to the NEP describing how they behave > in the > > current implementation. Cython code which uses either straight Python > > access or the buffer protocol is fine (after a bugfix in numpy, it > > wasn't failing currently as it should in the pep3118 case). C > code which > > follows the recommended practice of using PyArray_FromAny or one > of the > > related macros is also fine, because these functions have been > made to > > fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is > provided. > > > > In general, code which follows the recommended numpy practices will > > raise exceptions when encountering NA-masked arrays. This means > > programmers don't have to worry about the NA unless they want to > support > > it. Having things go through PyArray_FromAny also provides a > place where > > lazy evaluation arrays could be evaluated, and other similar > potential > > future extensions can use to provide compatibility. > > > > Here's the section I added to the NEP: > > > > Interaction With Pre-existing C API Usage > > ========================================= > > > > Making sure existing code using the C API, whether it's written > in C, C++, > > or Cython, does something reasonable is an important goal of this > > implementation. > > The general strategy is to make existing code which does not > explicitly > > tell numpy it supports NA masks fail with an exception saying so. > There are > > a few different access patterns people use to get ahold of the numpy > > array data, > > here we examine a few of them to see what numpy can do. These > examples are > > found from doing google searches of numpy C API array access. > > > > Numpy Documentation - How to extend NumPy > > ----------------------------------------- > > > > > http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects > > > > This page has a section "Dealing with array objects" which has some > > advice for how > > to access numpy arrays from C. When accepting arrays, the first > step it > > suggests is > > to use PyArray_FromAny or a macro built on that function, so code > > following this > > advice will properly fail when given an NA-masked array it > doesn't know > > how to handle. > > > > The way this is handled is that PyArray_FromAny requires a > special flag, > > NPY_ARRAY_ALLOWNA, > > before it will allow NA-masked arrays to flow through. > > > > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA > > > > Code which does not follow this advice, and instead just calls > > PyArray_Check() to verify > > its an ndarray and checks some flags, will silently produce incorrect > > results. This style > > of code does not provide any opportunity for numpy to say "hey, this > > array is special", > > so also is not compatible with future ideas of lazy evaluation, > derived > > dtypes, etc. > > This doesn't really cover the Cython code I write that interfaces with C > (and probably the code others write in Cython). > > Often I'd do: > > def f(arg): > cdef np.ndarray arr = np.asarray(arg) > c_func(np.PyArray_DATA(arr)) > > So I mix Python np.asarray with C PyArray_DATA. In general, I think you > use PyArray_FromAny if you're very concerned about performance or need > some special flag, but it's certainly not the first thing you tgry. > > > I guess this mixture of Python-API and C-API is different from the way > the API tries to protect incorrect access. From the Python API, it. > should let everything through, because it's for Python code to use. From > the C API, it should default to not letting things through, because > special NA-mask aware code needs to be written. I'm not sure if there is > a reasonable approach here which works for everything. Does that mean you consider changing ob_type for masked arrays unreasonable? They can still use the same object struct... > > But in general, I will often be lazy and just do > > def f(np.ndarray arr): > c_func(np.PyArray_DATA(arr)) > > It's an exception if you don't provide an array -- so who cares. (I > guess the odds of somebody feeding a masked array to code like that, > which doesn't try to be friendly, is relatively smaller though.) > > > This code would already fail with non-contiguous strides or byte-swapped > data, so the additional NA mask case seems to fit in an already-failing > category. Honestly! I hope you did't think I provided a full-fledged example? Perhaps you'd like to point out to me that "c_func" is a bad name for a function as well? One would of course check that things are contiguous (or pass on the strides), check the dtype and dispatch to different C functions in each case, etc. But that isn't the point. Scientific code most of the time does fall in the "already-failing" category. That doesn't mean it doesn't count. Let's focus on the number of code lines written and developer hours that will be spent cleaning up the mess -- not the "validity" of the code in question. > > > If you know the datatype, you can really do > > def f(np.ndarray[double] arr): > c_func(&arr[0]) > > which works with PEP 3118. But I use PyArray_DATA out of habit (and > since it works in the cases without dtype). > > Frankly, I don't expect any Cython code to do the right thing here; > calling PyArray_FromAny is much more typing. And really, nobody ever > questioned that if we had an actual ndarray instance, we'd be allowed to > call PyArray_DATA. > > I don't know how much Cython code is out there in the wild for which > this is a problem. Either way, it would cause something of a reeducation > challenge for Cython users. > > > Since this style of coding already has known problems, do you think the > case with NA-masks deserves more attention here? What will happen is. > access to array element data without consideration of the mask, which > seems similar in nature to accessing array data with the wrong stride or > byte order. I don't agree with the premise of that paragraph. There's no reason to assume that just because code doesn't call FromAny, it has problems. (And I'll continue to assume that whatever array is returned from "np.ascontiguousarray is really contiguous...) Whether it requires attention or not is a different issue though. I'm not sure. I think other people should weigh in on that -- I mostly write code for my own consumption. One should at least check pandas, scikits-image, scikits-learn, mpi4py, petsc4py, and so on. And ask on the Cython users list. Hopefully it will usually be PEP 3118. But now I need to turn in. Travis, would such a survey be likely to affect the outcome of your decision in any way? Or should we just leave this for now? Dag > > Cheers, > Mark > > Dag > > > > > Tutorial From Cython Website > > ---------------------------- > > > > http://docs.cython.org/src/tutorial/numpy.html > > > > This tutorial gives a convolution example, and all the examples > fail with > > Python exceptions when given inputs that contain NA values. > > > > Before any Cython type annotation is introduced, the code > functions just > > as equivalent Python would in the interpreter. > > > > When the type information is introduced, it is done via numpy.pxd > which > > defines a mapping between an ndarray declaration and > PyArrayObject \*. > > Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct > > comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray. > > > > Then the code does some dtype comparisons, and uses regular > python indexing > > to access the array elements. This python indexing still goes > through the > > Python API, so the NA handling and error checking in numpy still > can work > > like normal and fail if the inputs have NAs which cannot fit in > the output > > array. In this case it fails when trying to convert the NA into > an integer > > to set in in the output. > > > > The next version of the code introduces more efficient indexing. This > > operates based on Python's buffer protocol. This causes Cython to > call > > __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls > > PyObject_GetBuffer. This call gives numpy the opportunity to raise an > > exception if the inputs are arrays with NA-masks, something not > supported > > by the Python buffer protocol. > > > > Numerical Python - JPL website > > ------------------------------ > > > > http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html > > > > This document is from 2001, so does not reflect recent numpy, but > it is the > > second hit when searching for "numpy c api example" on google. > > > > There first example, heading "A simple example", is in fact already > > invalid for > > recent numpy even without the NA support. In particular, if the > data is > > misaligned > > or in a different byteorder, it may crash or produce incorrect > results. > > > > The next thing the document does is introduce > > PyArray_ContiguousFromObject, which > > gives numpy an opportunity to raise an exception when NA-masked > arrays > > are used, > > so the later code will raise exceptions as desired. > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Thu May 10 20:35:57 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 02:35:57 +0200 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: <4FAC5A16.6090407@astro.uio.no> References: <4FAC457F.3090405@astro.uio.no> <4FAC5A16.6090407@astro.uio.no> Message-ID: <5f795aa6-312f-4fd7-a0bc-e180900d38a7@email.android.com> Dag Sverre Seljebotn wrote: >On 05/11/2012 01:06 AM, Mark Wiebe wrote: >> On Thu, May 10, 2012 at 5:47 PM, Dag Sverre Seljebotn >> > >wrote: >> >> On 05/11/2012 12:28 AM, Mark Wiebe wrote: >> > I did some searching for typical Cython and C code which >accesses >> numpy >> > arrays, and added a section to the NEP describing how they >behave >> in the >> > current implementation. Cython code which uses either straight >Python >> > access or the buffer protocol is fine (after a bugfix in >numpy, it >> > wasn't failing currently as it should in the pep3118 case). C >> code which >> > follows the recommended practice of using PyArray_FromAny or >one >> of the >> > related macros is also fine, because these functions have been >> made to >> > fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is >> provided. >> > >> > In general, code which follows the recommended numpy practices >will >> > raise exceptions when encountering NA-masked arrays. This >means >> > programmers don't have to worry about the NA unless they want >to >> support >> > it. Having things go through PyArray_FromAny also provides a >> place where >> > lazy evaluation arrays could be evaluated, and other similar >> potential >> > future extensions can use to provide compatibility. >> > >> > Here's the section I added to the NEP: >> > >> > Interaction With Pre-existing C API Usage >> > ========================================= >> > >> > Making sure existing code using the C API, whether it's >written >> in C, C++, >> > or Cython, does something reasonable is an important goal of >this >> > implementation. >> > The general strategy is to make existing code which does not >> explicitly >> > tell numpy it supports NA masks fail with an exception saying >so. >> There are >> > a few different access patterns people use to get ahold of the >numpy >> > array data, >> > here we examine a few of them to see what numpy can do. These >> examples are >> > found from doing google searches of numpy C API array access. >> > >> > Numpy Documentation - How to extend NumPy >> > ----------------------------------------- >> > >> > >> >http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects >> > >> > This page has a section "Dealing with array objects" which has >some >> > advice for how >> > to access numpy arrays from C. When accepting arrays, the >first >> step it >> > suggests is >> > to use PyArray_FromAny or a macro built on that function, so >code >> > following this >> > advice will properly fail when given an NA-masked array it >> doesn't know >> > how to handle. >> > >> > The way this is handled is that PyArray_FromAny requires a >> special flag, >> > NPY_ARRAY_ALLOWNA, >> > before it will allow NA-masked arrays to flow through. >> > >> > >> >http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA >> > >> > Code which does not follow this advice, and instead just calls >> > PyArray_Check() to verify >> > its an ndarray and checks some flags, will silently produce >incorrect >> > results. This style >> > of code does not provide any opportunity for numpy to say >"hey, this >> > array is special", >> > so also is not compatible with future ideas of lazy >evaluation, >> derived >> > dtypes, etc. >> >> This doesn't really cover the Cython code I write that interfaces >with C >> (and probably the code others write in Cython). >> >> Often I'd do: >> >> def f(arg): >> cdef np.ndarray arr = np.asarray(arg) >> c_func(np.PyArray_DATA(arr)) >> >> So I mix Python np.asarray with C PyArray_DATA. In general, I >think you >> use PyArray_FromAny if you're very concerned about performance or >need >> some special flag, but it's certainly not the first thing you >tgry. >> >> >> I guess this mixture of Python-API and C-API is different from the >way >> the API tries to protect incorrect access. From the Python API, it. >> should let everything through, because it's for Python code to use. >From >> the C API, it should default to not letting things through, because >> special NA-mask aware code needs to be written. I'm not sure if there >is >> a reasonable approach here which works for everything. > >Does that mean you consider changing ob_type for masked arrays >unreasonable? They can still use the same object struct... > >> >> But in general, I will often be lazy and just do >> >> def f(np.ndarray arr): >> c_func(np.PyArray_DATA(arr)) >> >> It's an exception if you don't provide an array -- so who cares. >(I >> guess the odds of somebody feeding a masked array to code like >that, >> which doesn't try to be friendly, is relatively smaller though.) >> >> >> This code would already fail with non-contiguous strides or >byte-swapped >> data, so the additional NA mask case seems to fit in an >already-failing >> category. > >Honestly! I hope you did't think I provided a full-fledged example? >Perhaps you'd like to point out to me that "c_func" is a bad name for a > >function as well? > >One would of course check that things are contiguous (or pass on the >strides), check the dtype and dispatch to different C functions in each > >case, etc. > >But that isn't the point. Scientific code most of the time does fall in > >the "already-failing" category. That doesn't mean it doesn't count. >Let's focus on the number of code lines written and developer hours >that >will be spent cleaning up the mess -- not the "validity" of the code in > >question. > >> >> >> If you know the datatype, you can really do >> >> def f(np.ndarray[double] arr): >> c_func(&arr[0]) >> >> which works with PEP 3118. But I use PyArray_DATA out of habit >(and >> since it works in the cases without dtype). >> >> Frankly, I don't expect any Cython code to do the right thing >here; >> calling PyArray_FromAny is much more typing. And really, nobody >ever >> questioned that if we had an actual ndarray instance, we'd be >allowed to >> call PyArray_DATA. >> >> I don't know how much Cython code is out there in the wild for >which >> this is a problem. Either way, it would cause something of a >reeducation >> challenge for Cython users. >> >> >> Since this style of coding already has known problems, do you think >the >> case with NA-masks deserves more attention here? What will happen is. >> access to array element data without consideration of the mask, which >> seems similar in nature to accessing array data with the wrong stride >or >> byte order. I realized something -- I think this is not the most important question to ask. The question to ask is: what will create a nice, seamless NA-experience for a NumPy user. Can he/she just try to call a function (which may call other functions, which may call...) with a masked array and trust that it is correct or barfs? It's not a question of how much code needs fixing, but of the uncertainty and delay of adoption it'll create that code needs to be verified. With ndmasked, you get a *guarantee* against old code. (crazy thought: look into whether ob-type can be reassigned after object creation? I wouldn't put it past CPython to pull off a hack like that.) Dag > >I don't agree with the premise of that paragraph. There's no reason to >assume that just because code doesn't call FromAny, it has problems. >(And I'll continue to assume that whatever array is returned from >"np.ascontiguousarray is really contiguous...) > >Whether it requires attention or not is a different issue though. I'm >not sure. I think other people should weigh in on that -- I mostly >write >code for my own consumption. > >One should at least check pandas, scikits-image, scikits-learn, mpi4py, > >petsc4py, and so on. And ask on the Cython users list. Hopefully it >will >usually be PEP 3118. But now I need to turn in. > >Travis, would such a survey be likely to affect the outcome of your >decision in any way? Or should we just leave this for now? > >Dag > >> >> Cheers, >> Mark >> >> Dag >> >> > >> > Tutorial From Cython Website >> > ---------------------------- >> > >> > http://docs.cython.org/src/tutorial/numpy.html >> > >> > This tutorial gives a convolution example, and all the >examples >> fail with >> > Python exceptions when given inputs that contain NA values. >> > >> > Before any Cython type annotation is introduced, the code >> functions just >> > as equivalent Python would in the interpreter. >> > >> > When the type information is introduced, it is done via >numpy.pxd >> which >> > defines a mapping between an ndarray declaration and >> PyArrayObject \*. >> > Under the hood, this maps to __Pyx_ArgTypeTest, which does a >direct >> > comparison of Py_TYPE(obj) against the PyTypeObject for the >ndarray. >> > >> > Then the code does some dtype comparisons, and uses regular >> python indexing >> > to access the array elements. This python indexing still goes >> through the >> > Python API, so the NA handling and error checking in numpy >still >> can work >> > like normal and fail if the inputs have NAs which cannot fit >in >> the output >> > array. In this case it fails when trying to convert the NA >into >> an integer >> > to set in in the output. >> > >> > The next version of the code introduces more efficient >indexing. This >> > operates based on Python's buffer protocol. This causes Cython >to >> call >> > __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which >calls >> > PyObject_GetBuffer. This call gives numpy the opportunity to >raise an >> > exception if the inputs are arrays with NA-masks, something >not >> supported >> > by the Python buffer protocol. >> > >> > Numerical Python - JPL website >> > ------------------------------ >> > >> > >http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html >> > >> > This document is from 2001, so does not reflect recent numpy, >but >> it is the >> > second hit when searching for "numpy c api example" on google. >> > >> > There first example, heading "A simple example", is in fact >already >> > invalid for >> > recent numpy even without the NA support. In particular, if >the >> data is >> > misaligned >> > or in a different byteorder, it may crash or produce incorrect >> results. >> > >> > The next thing the document does is introduce >> > PyArray_ContiguousFromObject, which >> > gives numpy an opportunity to raise an exception when >NA-masked >> arrays >> > are used, >> > so the later code will raise exceptions as desired. >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Thu May 10 21:01:06 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 03:01:06 +0200 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: <4FAC5A16.6090407@astro.uio.no> References: <4FAC457F.3090405@astro.uio.no> <4FAC5A16.6090407@astro.uio.no> Message-ID: Dag Sverre Seljebotn wrote: >On 05/11/2012 01:06 AM, Mark Wiebe wrote: >> On Thu, May 10, 2012 at 5:47 PM, Dag Sverre Seljebotn >> > >wrote: >> >> On 05/11/2012 12:28 AM, Mark Wiebe wrote: >> > I did some searching for typical Cython and C code which >accesses >> numpy >> > arrays, and added a section to the NEP describing how they >behave >> in the >> > current implementation. Cython code which uses either straight >Python >> > access or the buffer protocol is fine (after a bugfix in >numpy, it >> > wasn't failing currently as it should in the pep3118 case). C >> code which >> > follows the recommended practice of using PyArray_FromAny or >one >> of the >> > related macros is also fine, because these functions have been >> made to >> > fail on NA-masked arrays unless the flag NPY_ARRAY_ALLOWNA is >> provided. >> > >> > In general, code which follows the recommended numpy practices >will >> > raise exceptions when encountering NA-masked arrays. This >means >> > programmers don't have to worry about the NA unless they want >to >> support >> > it. Having things go through PyArray_FromAny also provides a >> place where >> > lazy evaluation arrays could be evaluated, and other similar >> potential >> > future extensions can use to provide compatibility. >> > >> > Here's the section I added to the NEP: >> > >> > Interaction With Pre-existing C API Usage >> > ========================================= >> > >> > Making sure existing code using the C API, whether it's >written >> in C, C++, >> > or Cython, does something reasonable is an important goal of >this >> > implementation. >> > The general strategy is to make existing code which does not >> explicitly >> > tell numpy it supports NA masks fail with an exception saying >so. >> There are >> > a few different access patterns people use to get ahold of the >numpy >> > array data, >> > here we examine a few of them to see what numpy can do. These >> examples are >> > found from doing google searches of numpy C API array access. >> > >> > Numpy Documentation - How to extend NumPy >> > ----------------------------------------- >> > >> > >> >http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#dealing-with-array-objects >> > >> > This page has a section "Dealing with array objects" which has >some >> > advice for how >> > to access numpy arrays from C. When accepting arrays, the >first >> step it >> > suggests is >> > to use PyArray_FromAny or a macro built on that function, so >code >> > following this >> > advice will properly fail when given an NA-masked array it >> doesn't know >> > how to handle. >> > >> > The way this is handled is that PyArray_FromAny requires a >> special flag, >> > NPY_ARRAY_ALLOWNA, >> > before it will allow NA-masked arrays to flow through. >> > >> > >> >http://docs.scipy.org/doc/numpy/reference/c-api.array.html#NPY_ARRAY_ALLOWNA >> > >> > Code which does not follow this advice, and instead just calls >> > PyArray_Check() to verify >> > its an ndarray and checks some flags, will silently produce >incorrect >> > results. This style >> > of code does not provide any opportunity for numpy to say >"hey, this >> > array is special", >> > so also is not compatible with future ideas of lazy >evaluation, >> derived >> > dtypes, etc. >> >> This doesn't really cover the Cython code I write that interfaces >with C >> (and probably the code others write in Cython). >> >> Often I'd do: >> >> def f(arg): >> cdef np.ndarray arr = np.asarray(arg) >> c_func(np.PyArray_DATA(arr)) >> >> So I mix Python np.asarray with C PyArray_DATA. In general, I >think you >> use PyArray_FromAny if you're very concerned about performance or >need >> some special flag, but it's certainly not the first thing you >tgry. >> >> >> I guess this mixture of Python-API and C-API is different from the >way >> the API tries to protect incorrect access. From the Python API, it. >> should let everything through, because it's for Python code to use. >From >> the C API, it should default to not letting things through, because >> special NA-mask aware code needs to be written. I'm not sure if there >is >> a reasonable approach here which works for everything. > >Does that mean you consider changing ob_type for masked arrays >unreasonable? They can still use the same object struct... > >> >> But in general, I will often be lazy and just do >> >> def f(np.ndarray arr): >> c_func(np.PyArray_DATA(arr)) >> >> It's an exception if you don't provide an array -- so who cares. >(I >> guess the odds of somebody feeding a masked array to code like >that, >> which doesn't try to be friendly, is relatively smaller though.) >> >> >> This code would already fail with non-contiguous strides or >byte-swapped >> data, so the additional NA mask case seems to fit in an >already-failing >> category. > >Honestly! I hope you did't think I provided a full-fledged example? >Perhaps you'd like to point out to me that "c_func" is a bad name for a > >function as well? I keep having to apologise; I now see how you must have read my example, with me referring to 'lazy'. Anyway, I just meant that I would be too lazy to deal with somebody passing anything but exactly the right array -- too lazy to deal with conversion. In particular for output arrays, checking flags and dtype is just faster to code down than checking the FromAny docs for the right flags. Dag > >One would of course check that things are contiguous (or pass on the >strides), check the dtype and dispatch to different C functions in each > >case, etc. > >But that isn't the point. Scientific code most of the time does fall in > >the "already-failing" category. That doesn't mean it doesn't count. >Let's focus on the number of code lines written and developer hours >that >will be spent cleaning up the mess -- not the "validity" of the code in > >question. > >> >> >> If you know the datatype, you can really do >> >> def f(np.ndarray[double] arr): >> c_func(&arr[0]) >> >> which works with PEP 3118. But I use PyArray_DATA out of habit >(and >> since it works in the cases without dtype). >> >> Frankly, I don't expect any Cython code to do the right thing >here; >> calling PyArray_FromAny is much more typing. And really, nobody >ever >> questioned that if we had an actual ndarray instance, we'd be >allowed to >> call PyArray_DATA. >> >> I don't know how much Cython code is out there in the wild for >which >> this is a problem. Either way, it would cause something of a >reeducation >> challenge for Cython users. >> >> >> Since this style of coding already has known problems, do you think >the >> case with NA-masks deserves more attention here? What will happen is. >> access to array element data without consideration of the mask, which >> seems similar in nature to accessing array data with the wrong stride >or >> byte order. > >I don't agree with the premise of that paragraph. There's no reason to >assume that just because code doesn't call FromAny, it has problems. >(And I'll continue to assume that whatever array is returned from >"np.ascontiguousarray is really contiguous...) > >Whether it requires attention or not is a different issue though. I'm >not sure. I think other people should weigh in on that -- I mostly >write >code for my own consumption. > >One should at least check pandas, scikits-image, scikits-learn, mpi4py, > >petsc4py, and so on. And ask on the Cython users list. Hopefully it >will >usually be PEP 3118. But now I need to turn in. > >Travis, would such a survey be likely to affect the outcome of your >decision in any way? Or should we just leave this for now? > >Dag > >> >> Cheers, >> Mark >> >> Dag >> >> > >> > Tutorial From Cython Website >> > ---------------------------- >> > >> > http://docs.cython.org/src/tutorial/numpy.html >> > >> > This tutorial gives a convolution example, and all the >examples >> fail with >> > Python exceptions when given inputs that contain NA values. >> > >> > Before any Cython type annotation is introduced, the code >> functions just >> > as equivalent Python would in the interpreter. >> > >> > When the type information is introduced, it is done via >numpy.pxd >> which >> > defines a mapping between an ndarray declaration and >> PyArrayObject \*. >> > Under the hood, this maps to __Pyx_ArgTypeTest, which does a >direct >> > comparison of Py_TYPE(obj) against the PyTypeObject for the >ndarray. >> > >> > Then the code does some dtype comparisons, and uses regular >> python indexing >> > to access the array elements. This python indexing still goes >> through the >> > Python API, so the NA handling and error checking in numpy >still >> can work >> > like normal and fail if the inputs have NAs which cannot fit >in >> the output >> > array. In this case it fails when trying to convert the NA >into >> an integer >> > to set in in the output. >> > >> > The next version of the code introduces more efficient >indexing. This >> > operates based on Python's buffer protocol. This causes Cython >to >> call >> > __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which >calls >> > PyObject_GetBuffer. This call gives numpy the opportunity to >raise an >> > exception if the inputs are arrays with NA-masks, something >not >> supported >> > by the Python buffer protocol. >> > >> > Numerical Python - JPL website >> > ------------------------------ >> > >> > >http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html >> > >> > This document is from 2001, so does not reflect recent numpy, >but >> it is the >> > second hit when searching for "numpy c api example" on google. >> > >> > There first example, heading "A simple example", is in fact >already >> > invalid for >> > recent numpy even without the NA support. In particular, if >the >> data is >> > misaligned >> > or in a different byteorder, it may crash or produce incorrect >> results. >> > >> > The next thing the document does is introduce >> > PyArray_ContiguousFromObject, which >> > gives numpy an opportunity to raise an exception when >NA-masked >> arrays >> > are used, >> > so the later code will raise exceptions as desired. >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From matthew.brett at gmail.com Thu May 10 23:28:09 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 10 May 2012 20:28:09 -0700 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <4FAAC918.8050705@astro.uio.no> Message-ID: Hi, On Thu, May 10, 2012 at 2:43 AM, Nathaniel Smith wrote: > Hi Matthew, > > On Thu, May 10, 2012 at 12:01 AM, Matthew Brett wrote: >>> The third proposal is certainly the best one from Cython's perspective; >>> and I imagine for those writing C extensions against the C API too. >>> Having PyType_Check fail for ndmasked is a very good way of having code >>> fail that is not written to take masks into account. >> >> Mark, Nathaniel - can you comment how your chosen approaches would >> interact with extension code? >> >> I'm guessing the bitpattern dtypes would be expected to cause >> extension code to choke if the type is not supported? > > That's pretty much how I'm imagining it, yes. Right now if you have, > say, a Cython function like > > cdef f(np.ndarray[double] a): > ? ?... > > and you do f(np.zeros(10, dtype=int)), then it will error out, because > that function doesn't know how to handle ints, only doubles. The same > would apply for, say, a NA-enabled integer. In general there are > almost arbitrarily many dtypes that could get passed into any function > (including user-defined ones, etc.), so C code already has to check > dtypes for correctness. > > Second order issues: > - There is certainly C code out there that just assumes that it will > only be passed an array with certain dtype (and ndim, memory layout, > etc...). If you write such C code then it's your job to make sure that > you only pass it the kinds of arrays that it expects, just like now > :-). > > - We may want to do some sort of special-casing of handling for > floating point NA dtypes that use an NaN as the "magic" bitpattern, > since many algorithms *will* work with these unchanged, and it might > be frustrating to have to wait for every extension module to be > updated just to allow for this case explicitly before using them. OTOH > you can easily work around this. Like say my_qr is a legacy C function > that will in fact propagate NaNs correctly, so float NA dtypes would > Just Work -- except, it errors out at the start because it doesn't > recognize the dtype. How annoying. We *could* have some special hack > you can use to force it to work anyway (by like making the "is this > the dtype I expect?" routine lie.) But you can also just do: > > ?def my_qr_wrapper(arr): > ? ?if arr.dtype is a NA float dtype with NaN magic value: > ? ? ?result = my_qr(arr.view(arr.dtype.base_dtype)) > ? ? ?return result.view(arr.dtype) > ? ?else: > ? ? ?return my_qr(arr) > > and hey presto, now it will correctly pass through NAs. So perhaps > it's not worth bothering with special hacks. > > - Of course if ?your extension function does want to handle NAs > generically, then there will be a simple C api for checking for them, > setting them, etc. Numpy needs such an API internally anyway! Thanks for this. Mark - in view of the discussions about Cython and extension code - could you say what you see as disadvantages to the ndmasked subclass proposal? Cheers, Matthew From travis at continuum.io Fri May 11 00:57:33 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 10 May 2012 23:57:33 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: <359C9AA1-B263-4E57-A02A-07A83DFE0A22@continuum.io> On May 10, 2012, at 3:40 AM, Scott Sinclair wrote: > On 9 May 2012 18:46, Travis Oliphant wrote: >> The document is available here: >> https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst > > This is orthogonal to the discussion, but I'm curious as to why this > discussion document has landed in the website repo? > > I suppose it's not a really big deal, but future uploads of the > website will now include a page at > http://numpy.scipy.org/NA-overview.html with the content of this > document. If that's desirable, I'll add a note at the top of the > overview referencing this discussion thread. If not it can be > relocated somewhere more desirable after this thread's discussion > deadline expires.. Yes, it can be relocated. Can you suggest where it should go? It was added there so that nathaniel and mark could both edit it together with Nathaniel added to the web-team. It may not be a bad place for it, though. At least for a while. -Travis > > Cheers, > Scott > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Fri May 11 01:14:22 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 11 May 2012 00:14:22 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: On May 10, 2012, at 12:21 AM, Charles R Harris wrote: > > > On Wed, May 9, 2012 at 11:05 PM, Benjamin Root wrote: > > > On Wednesday, May 9, 2012, Nathaniel Smith wrote: > > > My only objection to this proposal is that committing to this approach > seems premature. The existing masked array objects act quite > differently from numpy.ma, so why do you believe that they're a good > foundation for numpy.ma, and why will users want to switch to their > semantics over numpy.ma's semantics? These aren't rhetorical > questions, it seems like they must have concrete answers, but I don't > know what they are. > > Based on the design decisions made in the original NEP, a re-made numpy.ma would have to lose _some_ features particularly, the ability to share masks. Save for that and some very obscure behaviors that are undocumented, it is possible to remake numpy.ma as a compatibility layer. > > That being said, I think that there are some fundamental questions that has concerned. If I recall, there were unresolved questions about behaviors surrounding assignments to elements of a view. > > I see the project as broken down like this: > 1.) internal architecture (largely abi issues) > 2.) external architecture (hooks throughout numpy to utilize the new features where possible such as where= argument) > 3.) getter/setter semantics > 4.) mathematical semantics > > At this moment, I think we have pieces of 2 and they are fairly non-controversial. It is 1 that I see as being the immediate hold-up here. 3 & 4 are non-trivial, but because they are mostly about interfaces, I think we can be willing to accept some very basic, fundamental, barebones components here in order to lay the groundwork for a more complete API later. > > To talk of Travis's proposal, doing nothing is no-go. Not moving forward would dishearten the community. Making a ndmasked type is very intriguing. I see it as a set towards eventually deprecating ndarray? Also, how would it behave with no.asarray() and no.asanyarray()? My other concern is a possible violation of DRY. How difficult would it be to maintain two ndarrays in parallel? > > As for the flag approach, this still doesn't solve the problem of legacy code (or did I misunderstand?) > > My understanding of the flag is to allow the code to stay in and get reworked and experimented with while keeping it from contaminating conventional use. > > The whole point of putting the code in was to experiment and adjust. The rather bizarre idea that it needs to be perfect from the get go is disheartening, and is seldom how new things get developed. Sure, there is a plan up front, but there needs to be feedback and change. And in fact, I haven't seen much feedback about the actual code, I don't even know that the people complaining have tried using it to see where it hurts. I'd like that sort of feedback. > I don't think anyone is saying it needs to be perfect from the get go. What I am saying is that this is fundamental enough to downstream users that this kind of thing is best done as a separate object. The flag could still be used to make all Python-level array constructors build ndmasked objects. But, this doesn't address the C-level story where there is quite a bit of downstream use where people have used the NumPy array as just a pointer to memory without considering that there might be a mask attached that should be inspected as well. The NEP addresses this a little bit for those C or C++ consumers of the ndarray in C who always use PyArray_FromAny which can fail if the array has non-NULL mask contents. However, it is *not* true that all downstream users use PyArray_FromAny. A large number of users just use something like PyArray_Check and then PyArray_DATA to get the pointer to the data buffer and then go from there thinking of their data as a strided memory chunk only (no extra mask). The NEP fundamentally changes this simple invariant that has been in NumPy and Numeric before it for a long, long time. I really don't see how we can do this in a 1.7 release. It has too many unknown and I think unknowable downstream effects. But, I think we could introduce another arrayobject that is the masked_array with a Python-level flag that makes it the default array in Python. There are a few more subtleties, PyArray_Check by default will pass sub-classes so if the new ndmask array were a sub-class then it would be passed (just like current numpy.ma arrays and matrices would pass that check today). However, there is a PyArray_CheckExact macro which could be used to ensure the object was actually of PyArray_Type. There is also the PyArg_ParseTuple command with "O!" that I have seen used many times to ensure an exact NumPy array. -Travis > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri May 11 01:36:18 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 11 May 2012 00:36:18 -0500 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: <4FAC5A16.6090407@astro.uio.no> References: <4FAC457F.3090405@astro.uio.no> <4FAC5A16.6090407@astro.uio.no> Message-ID: >> >> I guess this mixture of Python-API and C-API is different from the way >> the API tries to protect incorrect access. From the Python API, it. >> should let everything through, because it's for Python code to use. From >> the C API, it should default to not letting things through, because >> special NA-mask aware code needs to be written. I'm not sure if there is >> a reasonable approach here which works for everything. > > Does that mean you consider changing ob_type for masked arrays > unreasonable? They can still use the same object struct... > >> >> But in general, I will often be lazy and just do >> >> def f(np.ndarray arr): >> c_func(np.PyArray_DATA(arr)) >> >> It's an exception if you don't provide an array -- so who cares. (I >> guess the odds of somebody feeding a masked array to code like that, >> which doesn't try to be friendly, is relatively smaller though.) >> >> >> This code would already fail with non-contiguous strides or byte-swapped >> data, so the additional NA mask case seems to fit in an already-failing >> category. > > Honestly! I hope you did't think I provided a full-fledged example? > Perhaps you'd like to point out to me that "c_func" is a bad name for a > function as well? > > One would of course check that things are contiguous (or pass on the > strides), check the dtype and dispatch to different C functions in each > case, etc. > > But that isn't the point. Scientific code most of the time does fall in > the "already-failing" category. That doesn't mean it doesn't count. > Let's focus on the number of code lines written and developer hours that > will be spent cleaning up the mess -- not the "validity" of the code in > question. > >> >> >> If you know the datatype, you can really do >> >> def f(np.ndarray[double] arr): >> c_func(&arr[0]) >> >> which works with PEP 3118. But I use PyArray_DATA out of habit (and >> since it works in the cases without dtype). >> >> Frankly, I don't expect any Cython code to do the right thing here; >> calling PyArray_FromAny is much more typing. And really, nobody ever >> questioned that if we had an actual ndarray instance, we'd be allowed to >> call PyArray_DATA. >> >> I don't know how much Cython code is out there in the wild for which >> this is a problem. Either way, it would cause something of a reeducation >> challenge for Cython users. >> >> >> Since this style of coding already has known problems, do you think the >> case with NA-masks deserves more attention here? What will happen is. >> access to array element data without consideration of the mask, which >> seems similar in nature to accessing array data with the wrong stride or >> byte order. > > I don't agree with the premise of that paragraph. There's no reason to > assume that just because code doesn't call FromAny, it has problems. > (And I'll continue to assume that whatever array is returned from > "np.ascontiguousarray is really contiguous...) > > Whether it requires attention or not is a different issue though. I'm > not sure. I think other people should weigh in on that -- I mostly write > code for my own consumption. > > One should at least check pandas, scikits-image, scikits-learn, mpi4py, > petsc4py, and so on. And ask on the Cython users list. Hopefully it will > usually be PEP 3118. But now I need to turn in. > > Travis, would such a survey be likely to affect the outcome of your > decision in any way? Or should we just leave this for now? > This dialog gets at the heart of the matter, I think. The NEP seems to want NumPy to have a "better" API that always protects downstream users from understanding what is actually under the covers. It would prefer to push NumPy in the direction of an array object that is fundamentally more opaque. However, the world NumPy lives in is decidedly not opaque. There has been significant education and shared understanding of what a NumPy array actually *is* (a strided view of memory of a particular "dtype"). This shared understanding has even been pushed into Python as the buffer protocol. It is very common for extension modules to go directly to the data they want by using this understanding. This is very different from the traditional "shield your users" from how things are actually done view of most object APIs. It was actually intentional. I'm not saying that different choices could not have been made or that some amount of shielding should never be contemplated. I'm just saying that NumPy has been used as a nice bridge between the world of scientific computing codes that have chunks of memory allocated for processing and high-level code. Part of the reason for this bridge has been the simple object model. I just don't think the NEP fully appreciates just how fundamental of a shift this is in the wider NumPy community and it is not something that can be done immediately or without careful attention. Dag, is an *active* member in that larger group of C-consumers of NumPy arrays. As a long-time member of that group, myself, this is where my concerns are coming from. So far I am not hearing anything to alleviate those concerns. See my post in the other thread for my proposal to add a flag that allows users to switch between the Python side default being ndarray's or ndmasked, but they are different types at the C-level. The proposal so far does not specify whether or not ndarray or ndmasked is a subclass of the other. Given the history of numpy.ma and the fact that it makes sense on the C-level, I would lean toward ndmasked being a sub-class of ndarray --- thus a C-user would have to do a PyArray_CheckExact to ensure they are getting a base Python Array Object --- which they would have to do anyway because numpy.ma arrays also pass PyArray_Check. Best regards, -Travis > Dag > >> >> Cheers, >> Mark >> >> Dag >> >>> >>> Tutorial From Cython Website >>> ---------------------------- >>> >>> http://docs.cython.org/src/tutorial/numpy.html >>> >>> This tutorial gives a convolution example, and all the examples >> fail with >>> Python exceptions when given inputs that contain NA values. >>> >>> Before any Cython type annotation is introduced, the code >> functions just >>> as equivalent Python would in the interpreter. >>> >>> When the type information is introduced, it is done via numpy.pxd >> which >>> defines a mapping between an ndarray declaration and >> PyArrayObject \*. >>> Under the hood, this maps to __Pyx_ArgTypeTest, which does a direct >>> comparison of Py_TYPE(obj) against the PyTypeObject for the ndarray. >>> >>> Then the code does some dtype comparisons, and uses regular >> python indexing >>> to access the array elements. This python indexing still goes >> through the >>> Python API, so the NA handling and error checking in numpy still >> can work >>> like normal and fail if the inputs have NAs which cannot fit in >> the output >>> array. In this case it fails when trying to convert the NA into >> an integer >>> to set in in the output. >>> >>> The next version of the code introduces more efficient indexing. This >>> operates based on Python's buffer protocol. This causes Cython to >> call >>> __Pyx_GetBufferAndValidate, which calls __Pyx_GetBuffer, which calls >>> PyObject_GetBuffer. This call gives numpy the opportunity to raise an >>> exception if the inputs are arrays with NA-masks, something not >> supported >>> by the Python buffer protocol. >>> >>> Numerical Python - JPL website >>> ------------------------------ >>> >>> http://dsnra.jpl.nasa.gov/software/Python/numpydoc/numpy-13.html >>> >>> This document is from 2001, so does not reflect recent numpy, but >> it is the >>> second hit when searching for "numpy c api example" on google. >>> >>> There first example, heading "A simple example", is in fact already >>> invalid for >>> recent numpy even without the NA support. In particular, if the >> data is >>> misaligned >>> or in a different byteorder, it may crash or produce incorrect >> results. >>> >>> The next thing the document does is introduce >>> PyArray_ContiguousFromObject, which >>> gives numpy an opportunity to raise an exception when NA-masked >> arrays >>> are used, >>> so the later code will raise exceptions as desired. >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Fri May 11 01:45:18 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 07:45:18 +0200 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: References: <4FAC457F.3090405@astro.uio.no> <4FAC5A16.6090407@astro.uio.no> Message-ID: <4FACA76E.3070902@astro.uio.no> On 05/11/2012 07:36 AM, Travis Oliphant wrote: >>> >>> I guess this mixture of Python-API and C-API is different from the way >>> the API tries to protect incorrect access. From the Python API, it. >>> should let everything through, because it's for Python code to use. From >>> the C API, it should default to not letting things through, because >>> special NA-mask aware code needs to be written. I'm not sure if there is >>> a reasonable approach here which works for everything. >> >> Does that mean you consider changing ob_type for masked arrays >> unreasonable? They can still use the same object struct... >> >>> >>> But in general, I will often be lazy and just do >>> >>> def f(np.ndarray arr): >>> c_func(np.PyArray_DATA(arr)) >>> >>> It's an exception if you don't provide an array -- so who cares. (I >>> guess the odds of somebody feeding a masked array to code like that, >>> which doesn't try to be friendly, is relatively smaller though.) >>> >>> >>> This code would already fail with non-contiguous strides or byte-swapped >>> data, so the additional NA mask case seems to fit in an already-failing >>> category. >> >> Honestly! I hope you did't think I provided a full-fledged example? >> Perhaps you'd like to point out to me that "c_func" is a bad name for a >> function as well? >> >> One would of course check that things are contiguous (or pass on the >> strides), check the dtype and dispatch to different C functions in each >> case, etc. >> >> But that isn't the point. Scientific code most of the time does fall in >> the "already-failing" category. That doesn't mean it doesn't count. >> Let's focus on the number of code lines written and developer hours that >> will be spent cleaning up the mess -- not the "validity" of the code in >> question. >> >>> >>> >>> If you know the datatype, you can really do >>> >>> def f(np.ndarray[double] arr): >>> c_func(&arr[0]) >>> >>> which works with PEP 3118. But I use PyArray_DATA out of habit (and >>> since it works in the cases without dtype). >>> >>> Frankly, I don't expect any Cython code to do the right thing here; >>> calling PyArray_FromAny is much more typing. And really, nobody ever >>> questioned that if we had an actual ndarray instance, we'd be allowed to >>> call PyArray_DATA. >>> >>> I don't know how much Cython code is out there in the wild for which >>> this is a problem. Either way, it would cause something of a reeducation >>> challenge for Cython users. >>> >>> >>> Since this style of coding already has known problems, do you think the >>> case with NA-masks deserves more attention here? What will happen is. >>> access to array element data without consideration of the mask, which >>> seems similar in nature to accessing array data with the wrong stride or >>> byte order. >> >> I don't agree with the premise of that paragraph. There's no reason to >> assume that just because code doesn't call FromAny, it has problems. >> (And I'll continue to assume that whatever array is returned from >> "np.ascontiguousarray is really contiguous...) >> >> Whether it requires attention or not is a different issue though. I'm >> not sure. I think other people should weigh in on that -- I mostly write >> code for my own consumption. >> >> One should at least check pandas, scikits-image, scikits-learn, mpi4py, >> petsc4py, and so on. And ask on the Cython users list. Hopefully it will >> usually be PEP 3118. But now I need to turn in. >> >> Travis, would such a survey be likely to affect the outcome of your >> decision in any way? Or should we just leave this for now? >> > > This dialog gets at the heart of the matter, I think. The NEP seems to want NumPy to have a "better" API that always protects downstream users from understanding what is actually under the covers. It would prefer to push NumPy in the direction of an array object that is fundamentally more opaque. However, the world NumPy lives in is decidedly not opaque. There has been significant education and shared understanding of what a NumPy array actually *is* (a strided view of memory of a particular "dtype"). This shared understanding has even been pushed into Python as the buffer protocol. It is very common for extension modules to go directly to the data they want by using this understanding. > > This is very different from the traditional "shield your users" from how things are actually done view of most object APIs. It was actually intentional. I'm not saying that different choices could not have been made or that some amount of shielding should never be contemplated. I'm just saying that NumPy has been used as a nice bridge between the world of scientific computing codes that have chunks of memory allocated for processing and high-level code. Part of the reason for this bridge has been the simple object model. > > I just don't think the NEP fully appreciates just how fundamental of a shift this is in the wider NumPy community and it is not something that can be done immediately or without careful attention. > > Dag, is an *active* member in that larger group of C-consumers of NumPy arrays. As a long-time member of that group, myself, this is where my concerns are coming from. So far I am not hearing anything to alleviate those concerns. > > See my post in the other thread for my proposal to add a flag that allows users to switch between the Python side default being ndarray's or ndmasked, but they are different types at the C-level. The proposal so far does not specify whether or not ndarray or ndmasked is a subclass of the other. Given the history of numpy.ma and the fact that it makes sense on the C-level, I would lean toward ndmasked being a sub-class of ndarray --- thus a C-user would have to do a PyArray_CheckExact to ensure they are getting a base Python Array Object --- which they would have to do anyway because numpy.ma arrays also pass PyArray_Check. Making it a subclass means existing Cython code is not catered for, as PyObject_TypeCheck is used. Is there a advantage for users by making it a subclass? Nobody is saying you couldn't 'inherit' the struct (make the ndmask struct be castable to a PyArrayObject*) even if that is not declared in the Python type object. Dag From scott.sinclair.za at gmail.com Fri May 11 02:03:23 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Fri, 11 May 2012 08:03:23 +0200 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: <359C9AA1-B263-4E57-A02A-07A83DFE0A22@continuum.io> References: <359C9AA1-B263-4E57-A02A-07A83DFE0A22@continuum.io> Message-ID: On 11 May 2012 06:57, Travis Oliphant wrote: > > On May 10, 2012, at 3:40 AM, Scott Sinclair wrote: > >> On 9 May 2012 18:46, Travis Oliphant wrote: >>> The document is available here: >>> ? ?https://github.com/numpy/numpy.scipy.org/blob/master/NA-overview.rst >> >> This is orthogonal to the discussion, but I'm curious as to why this >> discussion document has landed in the website repo? >> >> I suppose it's not a really big deal, but future uploads of the >> website will now include a page at >> http://numpy.scipy.org/NA-overview.html with the content of this >> document. If that's desirable, I'll add a note at the top of the >> overview referencing this discussion thread. If not it can be >> relocated somewhere more desirable after this thread's discussion >> deadline expires.. > > Yes, it can be relocated. ? Can you suggest where it should go? ?It was added there so that nathaniel and mark could both edit it together with Nathaniel added to the web-team. > > It may not be a bad place for it, though. ? At least for a while. Having thought about it, a page on the website isn't a bad idea. I've added a note pointing to this discussion. The document now appears at http://numpy.scipy.org/NA-overview.html Cheers, Scott From fperez.net at gmail.com Fri May 11 02:12:25 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 10 May 2012 23:12:25 -0700 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <359C9AA1-B263-4E57-A02A-07A83DFE0A22@continuum.io> Message-ID: On Thu, May 10, 2012 at 11:03 PM, Scott Sinclair wrote: > Having thought about it, a page on the website isn't a bad idea. I've > added a note pointing to this discussion. The document now appears at > http://numpy.scipy.org/NA-overview.html Why not have a separate repo for neps/discussion docs? That way, people can be added to the team as they need to edit them and removed when done, and it's separate from the main site itself. The site can simply have a link to this set of documents, which can be built, tracked, separately and cleanly. We have more or less that setup with ipython for the site and docs: - main site page that points to the doc builds: http://ipython.org/documentation.html - doc builds on a secondary site: http://ipython.org/ipython-doc/stable/index.html This seems to me like the best way to separate the main web team (assuming we'll have a nice website for numpy one day) from the team that will edit documents of nep/discussion type. I imagine the web team will be fairly stable, where as the team for these docs will have people coming and going. Just a thought... As usual, crib anything you find useful from our setup. Cheers, f From scott.sinclair.za at gmail.com Fri May 11 02:44:13 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Fri, 11 May 2012 08:44:13 +0200 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <359C9AA1-B263-4E57-A02A-07A83DFE0A22@continuum.io> Message-ID: On 11 May 2012 08:12, Fernando Perez wrote: > On Thu, May 10, 2012 at 11:03 PM, Scott Sinclair > wrote: >> Having thought about it, a page on the website isn't a bad idea. I've >> added a note pointing to this discussion. The document now appears at >> http://numpy.scipy.org/NA-overview.html > > Why not have a separate repo for neps/discussion docs? ?That way, > people can be added to the team as they need to edit them and removed > when done, and it's separate from the main site itself. ?The site can > simply have a link to this set of documents, which can be built, > tracked, separately and cleanly. ?We have more or less that setup with > ipython for the site and docs: > > - main site page that points to the doc builds: > http://ipython.org/documentation.html > - doc builds on a secondary site: > http://ipython.org/ipython-doc/stable/index.html That's pretty much how things already work. The documentation is in the main source tree and built docs end up at http://docs.scipy.org. NEPs live at https://github.com/numpy/numpy/tree/master/doc/neps, but don't get published outside of the source tree and there's no "preferred" place for discussion documents. > (assuming we'll have a nice website for numpy one day) Ha ha ha ;-) Thanks for the thoughts and prodding. Cheers, Scott From fperez.net at gmail.com Fri May 11 03:13:25 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 11 May 2012 00:13:25 -0700 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <359C9AA1-B263-4E57-A02A-07A83DFE0A22@continuum.io> Message-ID: On Thu, May 10, 2012 at 11:44 PM, Scott Sinclair wrote: > That's pretty much how things already work. The documentation is in > the main source tree and built docs end up at http://docs.scipy.org. > NEPs live at https://github.com/numpy/numpy/tree/master/doc/neps, but > don't get published outside of the source tree and there's no > "preferred" place for discussion documents. No, b/c that means that for someone to be able to push to a NEP, they'd have to get commit rights to the main numpy source code repo. The whole point of what I'm suggesting is to isolate the NEP repo so that commit rights can be given for it with minimal thought, whenever pretty much anyone says they're going to work on a NEP. Obviously today anyone can do that and submit a PR against the main repo, but that raises the PR review burden for said repo. And that burden is something that we should strive to keep as low as possible, so those key people (the team with commit rights to the main repo) can focus their limited resources on reviewing code PRs. I'm simply suggesting a way to spread the load as much as possible, so that the team with commit rights on the main repo isn't a bottleneck on other tasks. Cheers, f From d.s.seljebotn at astro.uio.no Fri May 11 07:13:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 13:13:28 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer Message-ID: <4FACF458.6070200@astro.uio.no> (NumPy devs: I know, I get too many ideas. But this time I *really* believe in it, I think this is going to be *huge*. And if Mark F. likes it it's not going to be without manpower; and as his mentor I'd pitch in too here and there.) (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't want to micro-manage your GSoC, just have your take.) Travis, thank you very much for those good words in the "NA-mask interactions..." thread. It put most of my concerns away. If anybody is leaning towards for opaqueness because of its OOP purity, I want to refer to C++ and its walled-garden of ideological purity -- it has, what, 3-4 different OOP array libraries, neither of which is able to out-compete the other. Meanwhile the rest of the world happily cooperates using pointers, strides, CSR and CSC. Now, there are limits to what you can do with strides and pointers. Noone's denying the need for more. In my mind that's an API where you can do fetch_block and put_block of cache-sized, N-dimensional blocks on an array; but it might be something slightly different. Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C API to deal with this issue. What we need is duck-typing/polymorphism at the C level. If you keep extending ndarray and the NumPy C API, what we'll have is a one-to-many relationship: One provider of array technology, multiple consumers (with hooks, I'm sure, but all implementations of the hook concept in the NumPy world I've seen so far are a total disaster!). What I think we need instead is something like PEP 3118 for the "abstract" array that is only available block-wise with getters and setters. On the Cython list we've decided that what we want for CEP 1000 (for boxing callbacks etc.) is to extend PyTypeObject with our own fields; we could create CEP 1001 to solve this issue and make any Python object an exporter of "block-getter/setter-arrays" (better name needed). What would be exported is (of course) a simple vtable: typedef struct { int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, ...); ... } block_getter_setter_array_vtable; Let's please discuss the details *after* the fundamentals. But the reason I put void* there instead of PyObject* is that I hope this could be used beyond the Python world (say, Python<->Julia); the void* would be handed to you at the time you receive the vtable (however we handle that). I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you could embed the block-transposition that's needed for efficient "arr + arr.T" at this level. Imagine being able to do this in Cython: a[...] = b + c * d and have that essentially compile to the numexpr blocked approach, *but* where b, c, and d can have whatever type that exports CEP 1001? So c could be a "diagonal" array which uses O(n) storage to export O(n^2) elements, for instance, and the unrolled Cython code never needs to know. As far as NumPy goes, something along these lines should hopefully mean that new C code being written doesn't rely so much on what exactly goes into "ndarray" and what goes into other classes; so that we don't get the same problem again that we do now with code that doesn't use PEP 3118. Dag From d.s.seljebotn at astro.uio.no Fri May 11 07:15:23 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 13:15:23 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FACF458.6070200@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> Message-ID: <4FACF4CB.6010709@astro.uio.no> On 05/11/2012 01:13 PM, Dag Sverre Seljebotn wrote: > (NumPy devs: I know, I get too many ideas. But this time I *really* > believe in it, I think this is going to be *huge*. And if Mark F. likes > it it's not going to be without manpower; and as his mentor I'd pitch in > too here and there.) > > (Mark F.: I believe this is *very* relevant to your GSoC. I certainly > don't want to micro-manage your GSoC, just have your take.) For the information of the rest of you: http://www.google-melange.com/gsoc/project/google/gsoc2012/markflorisson88/30002 Dag > > Travis, thank you very much for those good words in the "NA-mask > interactions..." thread. It put most of my concerns away. If anybody is > leaning towards for opaqueness because of its OOP purity, I want to > refer to C++ and its walled-garden of ideological purity -- it has, > what, 3-4 different OOP array libraries, neither of which is able to > out-compete the other. Meanwhile the rest of the world happily > cooperates using pointers, strides, CSR and CSC. > > Now, there are limits to what you can do with strides and pointers. > Noone's denying the need for more. In my mind that's an API where you > can do fetch_block and put_block of cache-sized, N-dimensional blocks on > an array; but it might be something slightly different. > > Here's what I'm asking: DO NOT simply keep extending ndarray and the > NumPy C API to deal with this issue. > > What we need is duck-typing/polymorphism at the C level. If you keep > extending ndarray and the NumPy C API, what we'll have is a one-to-many > relationship: One provider of array technology, multiple consumers (with > hooks, I'm sure, but all implementations of the hook concept in the > NumPy world I've seen so far are a total disaster!). > > What I think we need instead is something like PEP 3118 for the > "abstract" array that is only available block-wise with getters and > setters. On the Cython list we've decided that what we want for CEP 1000 > (for boxing callbacks etc.) is to extend PyTypeObject with our own > fields; we could create CEP 1001 to solve this issue and make any Python > object an exporter of "block-getter/setter-arrays" (better name needed). > > What would be exported is (of course) a simple vtable: > > typedef struct { > int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t > *lower_right, ...); > ... > } block_getter_setter_array_vtable; > > Let's please discuss the details *after* the fundamentals. But the > reason I put void* there instead of PyObject* is that I hope this could > be used beyond the Python world (say, Python<->Julia); the void* would > be handed to you at the time you receive the vtable (however we handle > that). > > I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you > could embed the block-transposition that's needed for efficient "arr + > arr.T" at this level. > > Imagine being able to do this in Cython: > > a[...] = b + c * d > > and have that essentially compile to the numexpr blocked approach, *but* > where b, c, and d can have whatever type that exports CEP 1001? So c > could be a "diagonal" array which uses O(n) storage to export O(n^2) > elements, for instance, and the unrolled Cython code never needs to know. > > As far as NumPy goes, something along these lines should hopefully mean > that new C code being written doesn't rely so much on what exactly goes > into "ndarray" and what goes into other classes; so that we don't get > the same problem again that we do now with code that doesn't use PEP 3118. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From markflorisson88 at gmail.com Fri May 11 09:25:07 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 11 May 2012 14:25:07 +0100 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FACF458.6070200@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> Message-ID: On 11 May 2012 12:13, Dag Sverre Seljebotn wrote: > (NumPy devs: I know, I get too many ideas. But this time I *really* believe > in it, I think this is going to be *huge*. And if Mark F. likes it it's not > going to be without manpower; and as his mentor I'd pitch in too here and > there.) > > (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't > want to micro-manage your GSoC, just have your take.) > > Travis, thank you very much for those good words in the "NA-mask > interactions..." thread. It put most of my concerns away. If anybody is > leaning towards for opaqueness because of its OOP purity, I want to refer to > C++ and its walled-garden of ideological purity -- it has, what, 3-4 > different OOP array libraries, neither of which is able to out-compete the > other. Meanwhile the rest of the world happily cooperates using pointers, > strides, CSR and CSC. > > Now, there are limits to what you can do with strides and pointers. Noone's > denying the need for more. In my mind that's an API where you can do > fetch_block and put_block of cache-sized, N-dimensional blocks on an array; > but it might be something slightly different. > > Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C > API to deal with this issue. > > What we need is duck-typing/polymorphism at the C level. If you keep > extending ndarray and the NumPy C API, what we'll have is a one-to-many > relationship: One provider of array technology, multiple consumers (with > hooks, I'm sure, but all implementations of the hook concept in the NumPy > world I've seen so far are a total disaster!). > > What I think we need instead is something like PEP 3118 for the "abstract" > array that is only available block-wise with getters and setters. On the > Cython list we've decided that what we want for CEP 1000 (for boxing > callbacks etc.) is to extend PyTypeObject with our own fields; we could > create CEP 1001 to solve this issue and make any Python object an exporter > of "block-getter/setter-arrays" (better name needed). > > What would be exported is (of course) a simple vtable: > > typedef struct { > ? ?int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, > ...); > ? ?... > } block_getter_setter_array_vtable; Interesting idea, I think I like it. So I suppose these blocks could even be used in a sparse context, where it returns one-sized blocks for un-surrounded elements. This means returned blocks can always be variably sized, correct? So will each block be associated with an index space? Otherwise, how do we know the element-wise correspondence? Will there be a default element for missing (unreturned) values? Am I correct in assuming elements in returned blocks always return elements in element-wise order? In case you manually have to block other parts of an array (lets say it's stored in Fortran order), then the variable sized block nature may be complicating the tiling pattern. > Let's please discuss the details *after* the fundamentals. But the reason I > put void* there instead of PyObject* is that I hope this could be used > beyond the Python world (say, Python<->Julia); the void* would be handed to > you at the time you receive the vtable (however we handle that). Yes, we should definitely not stick to objects here. > I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you > could embed the block-transposition that's needed for efficient "arr + > arr.T" at this level. > > Imagine being able to do this in Cython: > > a[...] = b + c * d > > and have that essentially compile to the numexpr blocked approach, *but* > where b, c, and d can have whatever type that exports CEP 1001? So c could > be a "diagonal" array which uses O(n) storage to export O(n^2) elements, for > instance, and the unrolled Cython code never needs to know. I assume random accesses will happen through some kind of B-tree/sparse index like mechanism? > As far as NumPy goes, something along these lines should hopefully mean that > new C code being written doesn't rely so much on what exactly goes into > "ndarray" and what goes into other classes; so that we don't get the same > problem again that we do now with code that doesn't use PEP 3118. > > Dag From d.s.seljebotn at astro.uio.no Fri May 11 09:27:39 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 11 May 2012 15:27:39 +0200 Subject: [Numpy-discussion] CEP 1001 - Custom PyTypeObject extensions Message-ID: <4FAD13CB.4070106@astro.uio.no> This comes from a refactor of the work on CEP 1000: It's a pre-PEP with a hack we can use *today*, that allows 3rd party libraries to agree on extensions to the PyTypeObject structure. http://wiki.cython.org/enhancements/cep1001 As hinted in my other recent thread, I believe this will also be good medicine as NumPy moves forward, so that we stop worrying about "ndarray" vs. C extensions, and rather talk about which PyTypeObject extensions are supported. Technical discussion about this on the Cython list please. Dag From markflorisson88 at gmail.com Fri May 11 09:37:45 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 11 May 2012 14:37:45 +0100 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FACF458.6070200@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> Message-ID: On 11 May 2012 12:13, Dag Sverre Seljebotn wrote: > (NumPy devs: I know, I get too many ideas. But this time I *really* believe > in it, I think this is going to be *huge*. And if Mark F. likes it it's not > going to be without manpower; and as his mentor I'd pitch in too here and > there.) > > (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't > want to micro-manage your GSoC, just have your take.) > > Travis, thank you very much for those good words in the "NA-mask > interactions..." thread. It put most of my concerns away. If anybody is > leaning towards for opaqueness because of its OOP purity, I want to refer to > C++ and its walled-garden of ideological purity -- it has, what, 3-4 > different OOP array libraries, neither of which is able to out-compete the > other. Meanwhile the rest of the world happily cooperates using pointers, > strides, CSR and CSC. > > Now, there are limits to what you can do with strides and pointers. Noone's > denying the need for more. In my mind that's an API where you can do > fetch_block and put_block of cache-sized, N-dimensional blocks on an array; > but it might be something slightly different. > > Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C > API to deal with this issue. > > What we need is duck-typing/polymorphism at the C level. If you keep > extending ndarray and the NumPy C API, what we'll have is a one-to-many > relationship: One provider of array technology, multiple consumers (with > hooks, I'm sure, but all implementations of the hook concept in the NumPy > world I've seen so far are a total disaster!). > > What I think we need instead is something like PEP 3118 for the "abstract" > array that is only available block-wise with getters and setters. On the > Cython list we've decided that what we want for CEP 1000 (for boxing > callbacks etc.) is to extend PyTypeObject with our own fields; we could > create CEP 1001 to solve this issue and make any Python object an exporter > of "block-getter/setter-arrays" (better name needed). > > What would be exported is (of course) a simple vtable: > > typedef struct { > ? ?int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, > ...); > ? ?... > } block_getter_setter_array_vtable; > > Let's please discuss the details *after* the fundamentals. But the reason I > put void* there instead of PyObject* is that I hope this could be used > beyond the Python world (say, Python<->Julia); the void* would be handed to > you at the time you receive the vtable (however we handle that). I suppose it would also be useful to have some way of predicting the output format polymorphically for the caller. E.g. dense * block_diagonal results in block diagonal, but dense + block_diagonal results in dense, etc. It might be useful for the caller to know whether it needs to allocate a sparse, dense or block-structured array. Or maybe the polymorphic function could even do the allocation. This needs to happen recursively of course, to avoid intermediate temporaries. The compiler could easily handle that, and so could numpy when it gets lazy evaluation. I think if the heavy lifting of allocating output arrays and exporting these arrays work in numpy, then support in Cython could use that (I can already hear certain people object to more complicated array stuff in Cython :). Even better here would be an external project that each our projects could use (I still think the nditer sorting functionality of arrays should be numpy-agnostic and externally available). > I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you > could embed the block-transposition that's needed for efficient "arr + > arr.T" at this level. > > Imagine being able to do this in Cython: > > a[...] = b + c * d > > and have that essentially compile to the numexpr blocked approach, *but* > where b, c, and d can have whatever type that exports CEP 1001? So c could > be a "diagonal" array which uses O(n) storage to export O(n^2) elements, for > instance, and the unrolled Cython code never needs to know. > > As far as NumPy goes, something along these lines should hopefully mean that > new C code being written doesn't rely so much on what exactly goes into > "ndarray" and what goes into other classes; so that we don't get the same > problem again that we do now with code that doesn't use PEP 3118. > > Dag From ben.root at ou.edu Fri May 11 10:05:17 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 11 May 2012 10:05:17 -0400 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 2:15 PM, Ralf Gommers wrote: > Hi, > > I'm pleased to announce the availability of the first release candidate of > NumPy 1.6.2. This is a maintenance release. Due to the delay of the NumPy > 1.7.0, this release contains far more fixes than a regular NumPy bugfix > release. It also includes a number of documentation and build improvements. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > Please test this release and report any issues on the numpy-discussion > mailing list. > > Cheers, > Ralf > > > > ``numpy.core`` issues fixed > --------------------------- > > #2063 make unique() return consistent index > #1138 allow creating arrays from empty buffers or empty slices > #1446 correct note about correspondence vstack and concatenate > #1149 make argmin() work for datetime > #1672 fix allclose() to work for scalar inf > #1747 make np.median() work for 0-D arrays > #1776 make complex division by zero to yield inf properly > #1675 add scalar support for the format() function > #1905 explicitly check for NaNs in allclose() > #1952 allow floating ddof in std() and var() > #1948 fix regression for indexing chararrays with empty list > #2017 fix type hashing > #2046 deleting array attributes causes segfault > #2033 a**2.0 has incorrect type > #2045 make attribute/iterator_element deletions not segfault > #2021 fix segfault in searchsorted() > #2073 fix float16 __array_interface__ bug > > > ``numpy.lib`` issues fixed > -------------------------- > > #2048 break reference cycle in NpzFile > #1573 savetxt() now handles complex arrays > #1387 allow bincount() to accept empty arrays > #1899 fixed histogramdd() bug with empty inputs > #1793 fix failing npyio test under py3k > #1936 fix extra nesting for subarray dtypes > #1848 make tril/triu return the same dtype as the original array > #1918 use Py_TYPE to access ob_type, so it works also on Py3 > > > ``numpy.f2py`` changes > ---------------------- > > ENH: Introduce new options extra_f77_compiler_args and > extra_f90_compiler_args > BLD: Improve reporting of fcompiler value > BUG: Fix f2py test_kind.py test > > > ``numpy.poly`` changes > ---------------------- > > ENH: Add some tests for polynomial printing > ENH: Add companion matrix functions > DOC: Rearrange the polynomial documents > BUG: Fix up links to classes > DOC: Add version added to some of the polynomial package modules > DOC: Document xxxfit functions in the polynomial package modules > BUG: The polynomial convenience classes let different types interact > DOC: Document the use of the polynomial convenience classes > DOC: Improve numpy reference documentation of polynomial classes > ENH: Improve the computation of polynomials from roots > STY: Code cleanup in polynomial [*]fromroots functions > DOC: Remove references to cast and NA, which were added in 1.7 > > > ``numpy.distutils`` issues fixed > ------------------------------- > > #1261 change compile flag on AIX from -O5 to -O3 > #1377 update HP compiler flags > #1383 provide better support for C++ code on HPUX > #1857 fix build for py3k + pip > BLD: raise a clearer warning in case of building without cleaning up > first > BLD: follow build_ext coding convention in build_clib > BLD: fix up detection of Intel CPU on OS X in system_info.py > BLD: add support for the new X11 directory structure on Ubuntu & co. > BLD: add ufsparse to the libraries search path. > BLD: add 'pgfortran' as a valid compiler in the Portland Group > BLD: update version match regexp for IBM AIX Fortran compilers. > > > I just noticed that my fix for the np.gradient() function isn't listed. https://github.com/numpy/numpy/pull/167 Not critical, but if a second rc is needed for any reason, it would be nice to have that in there. Thanks! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri May 11 10:47:13 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 11 May 2012 09:47:13 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <359C9AA1-B263-4E57-A02A-07A83DFE0A22@continuum.io> Message-ID: <7C646920-158F-433C-A4F2-3CEB4FF548B1@continuum.io> On May 11, 2012, at 2:13 AM, Fernando Perez wrote: > On Thu, May 10, 2012 at 11:44 PM, Scott Sinclair > wrote: >> That's pretty much how things already work. The documentation is in >> the main source tree and built docs end up at http://docs.scipy.org. >> NEPs live at https://github.com/numpy/numpy/tree/master/doc/neps, but >> don't get published outside of the source tree and there's no >> "preferred" place for discussion documents. > > No, b/c that means that for someone to be able to push to a NEP, > they'd have to get commit rights to the main numpy source code repo. > The whole point of what I'm suggesting is to isolate the NEP repo so > that commit rights can be given for it with minimal thought, whenever > pretty much anyone says they're going to work on a NEP. > > Obviously today anyone can do that and submit a PR against the main > repo, but that raises the PR review burden for said repo. And that > burden is something that we should strive to keep as low as possible, > so those key people (the team with commit rights to the main repo) can > focus their limited resources on reviewing code PRs. > > I'm simply suggesting a way to spread the load as much as possible, so > that the team with commit rights on the main repo isn't a bottleneck > on other tasks. This is a good idea. I think. I like the thought of a separate NEP and docs repo. -Travis > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Fri May 11 10:59:16 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 11 May 2012 09:59:16 -0500 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: <4FACA76E.3070902@astro.uio.no> References: <4FAC457F.3090405@astro.uio.no> <4FAC5A16.6090407@astro.uio.no> <4FACA76E.3070902@astro.uio.no> Message-ID: <538BA150-5DAA-4CBE-9238-772E6D231A04@continuum.io> >> >> See my post in the other thread for my proposal to add a flag that allows users to switch between the Python side default being ndarray's or ndmasked, but they are different types at the C-level. The proposal so far does not specify whether or not ndarray or ndmasked is a subclass of the other. Given the history of numpy.ma and the fact that it makes sense on the C-level, I would lean toward ndmasked being a sub-class of ndarray --- thus a C-user would have to do a PyArray_CheckExact to ensure they are getting a base Python Array Object --- which they would have to do anyway because numpy.ma arrays also pass PyArray_Check. > > Making it a subclass means existing Cython code is not catered for, as > PyObject_TypeCheck is used. That is good to know. My understanding, though, is that in this case Cython code will already be passing numpy.ma (which is a sub-class of ndarray) as an array and be ignoring its mask already. If that is the case, then this wouldn't fundamentally change anything for you. My proposal improves the situation for a few groups: 1) C-API Users that are not used to dealing with masks at all in downstream code 2) C Users who use an exact check (PyArray_CheckExact or "O!" in PyArg_ParseTuple). 3) C Users who don't want all arrays to now have masks attached > > Is there a advantage for users by making it a subclass? Nobody is saying > you couldn't 'inherit' the struct (make the ndmask struct be castable to > a PyArrayObject*) even if that is not declared in the Python type object. I don't really think there is an advantage of making it a subclass --- except numpy.ma is already a subclass and it might make it easier to write it's API on top of ndmasked. Definitely the first part of the ndmasked struct could be castable to a PyArrayObject (I would want to add room in that case to the PyArrayObject for at least one additional AuxData pointer that could hold label information). -Travis > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Fri May 11 11:01:50 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 11 May 2012 11:01:50 -0400 Subject: [Numpy-discussion] stable sort for structured dtypes? Message-ID: Hello all, I need to sort a structured array in a stable manner. I am also sorting only by one of the keys, so I don't think lexsort() is stable in that respect. np.sort() allows for choosing 'mergesort', but it appears to not be implemented for structured arrays. Am I going to have to create a new plain array out of the one column I want to sort by, and run np.artsort() with the mergesort in order to get around this? Or is there something more straightforward that I am missing? Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From talljimbo at gmail.com Fri May 11 11:14:50 2012 From: talljimbo at gmail.com (Jim Bosch) Date: Fri, 11 May 2012 11:14:50 -0400 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: References: <4FAC457F.3090405@astro.uio.no> <4FAC5A16.6090407@astro.uio.no> Message-ID: <4FAD2CEA.8020506@gmail.com> On 05/11/2012 01:36 AM, Travis Oliphant wrote: >>> >>> I guess this mixture of Python-API and C-API is different from the way >>> the API tries to protect incorrect access. From the Python API, it. >>> should let everything through, because it's for Python code to use. From >>> the C API, it should default to not letting things through, because >>> special NA-mask aware code needs to be written. I'm not sure if there is >>> a reasonable approach here which works for everything. >> >> Does that mean you consider changing ob_type for masked arrays >> unreasonable? They can still use the same object struct... >> >>> >>> But in general, I will often be lazy and just do >>> >>> def f(np.ndarray arr): >>> c_func(np.PyArray_DATA(arr)) >>> >>> It's an exception if you don't provide an array -- so who cares. (I >>> guess the odds of somebody feeding a masked array to code like that, >>> which doesn't try to be friendly, is relatively smaller though.) >>> >>> >>> This code would already fail with non-contiguous strides or byte-swapped >>> data, so the additional NA mask case seems to fit in an already-failing >>> category. >> >> Honestly! I hope you did't think I provided a full-fledged example? >> Perhaps you'd like to point out to me that "c_func" is a bad name for a >> function as well? >> >> One would of course check that things are contiguous (or pass on the >> strides), check the dtype and dispatch to different C functions in each >> case, etc. >> >> But that isn't the point. Scientific code most of the time does fall in >> the "already-failing" category. That doesn't mean it doesn't count. >> Let's focus on the number of code lines written and developer hours that >> will be spent cleaning up the mess -- not the "validity" of the code in >> question. >> >>> >>> >>> If you know the datatype, you can really do >>> >>> def f(np.ndarray[double] arr): >>> c_func(&arr[0]) >>> >>> which works with PEP 3118. But I use PyArray_DATA out of habit (and >>> since it works in the cases without dtype). >>> >>> Frankly, I don't expect any Cython code to do the right thing here; >>> calling PyArray_FromAny is much more typing. And really, nobody ever >>> questioned that if we had an actual ndarray instance, we'd be allowed to >>> call PyArray_DATA. >>> >>> I don't know how much Cython code is out there in the wild for which >>> this is a problem. Either way, it would cause something of a reeducation >>> challenge for Cython users. >>> >>> >>> Since this style of coding already has known problems, do you think the >>> case with NA-masks deserves more attention here? What will happen is. >>> access to array element data without consideration of the mask, which >>> seems similar in nature to accessing array data with the wrong stride or >>> byte order. >> >> I don't agree with the premise of that paragraph. There's no reason to >> assume that just because code doesn't call FromAny, it has problems. >> (And I'll continue to assume that whatever array is returned from >> "np.ascontiguousarray is really contiguous...) >> >> Whether it requires attention or not is a different issue though. I'm >> not sure. I think other people should weigh in on that -- I mostly write >> code for my own consumption. >> >> One should at least check pandas, scikits-image, scikits-learn, mpi4py, >> petsc4py, and so on. And ask on the Cython users list. Hopefully it will >> usually be PEP 3118. But now I need to turn in. >> >> Travis, would such a survey be likely to affect the outcome of your >> decision in any way? Or should we just leave this for now? >> > > This dialog gets at the heart of the matter, I think. The NEP seems to want NumPy to have a "better" API that always protects downstream users from understanding what is actually under the covers. It would prefer to push NumPy in the direction of an array object that is fundamentally more opaque. However, the world NumPy lives in is decidedly not opaque. There has been significant education and shared understanding of what a NumPy array actually *is* (a strided view of memory of a particular "dtype"). This shared understanding has even been pushed into Python as the buffer protocol. It is very common for extension modules to go directly to the data they want by using this understanding. > > This is very different from the traditional "shield your users" from how things are actually done view of most object APIs. It was actually intentional. I'm not saying that different choices could not have been made or that some amount of shielding should never be contemplated. I'm just saying that NumPy has been used as a nice bridge between the world of scientific computing codes that have chunks of memory allocated for processing and high-level code. Part of the reason for this bridge has been the simple object model. > > I just don't think the NEP fully appreciates just how fundamental of a shift this is in the wider NumPy community and it is not something that can be done immediately or without careful attention. > Just chiming in as another regular user of the C-API that strongly shares this view. NumPy arrays are useful to my project precisely because they are simply blocks of shared, strided memory, and they work so well for us because we know exactly how they're represented under the hood and we can map them straightforwardly to C and C++. We also don't use PyArray_FromAny or its brethren to get arrays in C - by and large, we consider that too "magical"; we'd prefer to fail when the array we are given isn't exactly what we were expecting, rather than create arrays from nested sequences or do anything else that involves an implicit deep-copy of the data. That said, I think we could pretty easily adapt to this change just by switching a lot of PyArray_Check calls to PyArray_CheckExact, but I think our usage of NumPy is another data point that says attaching masks to all arrays or making masked arrays a subclass of regular arrays is a step in the wrong direction design-wise. Jim From njs at pobox.com Fri May 11 11:24:24 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 11 May 2012 16:24:24 +0100 Subject: [Numpy-discussion] Anyone have a cached copy of NA-discussion-status? Message-ID: Hi all, I'm an idiot and seem to have accidentally deleted the NA-discussion-status web page. I do have a query into support at github, but, does anyone happen to have a local copy of the content, perhaps in their browser cache? Frustratedly yrs, -- Nathaniel From gael.varoquaux at normalesup.org Fri May 11 11:32:02 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 11 May 2012 17:32:02 +0200 Subject: [Numpy-discussion] NA-mask interactions with existing C code In-Reply-To: <538BA150-5DAA-4CBE-9238-772E6D231A04@continuum.io> References: <4FAC457F.3090405@astro.uio.no> <4FAC5A16.6090407@astro.uio.no> <4FACA76E.3070902@astro.uio.no> <538BA150-5DAA-4CBE-9238-772E6D231A04@continuum.io> Message-ID: <20120511153200.GC27019@phare.normalesup.org> On Fri, May 11, 2012 at 09:59:16AM -0500, Travis Oliphant wrote: > > Is there a advantage for users by making it a subclass? Nobody is saying > > you couldn't 'inherit' the struct (make the ndmask struct be castable to > > a PyArrayObject*) even if that is not declared in the Python type object. > I don't really think there is an advantage of making it a subclass --- except numpy.ma is already a subclass and it might make it easier to write it's API on top of ndmasked. Technically, could this be a usecase for the 'mixin' pattern? Ga?l From njs at pobox.com Fri May 11 11:42:35 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 11 May 2012 16:42:35 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) Message-ID: I've been trying to sort through the changes that landed in master from the missingdata branch to figure out how to separate out changes related to NA support from those that aren't, and noticed that one of them should probably be flagged to the list. Traditionally, arr.diagonal() and np.diagonal(arr) return a *copy* of the diagonal. Now, they return a view onto the original array. On the one hand, this seems like it's clearly the way it should have been since the beginning -- I'd expect .diagonal() to be a cheap operation, like .transpose() and .reshape(). But, it's a potential compatibility break if there is code out there that assumes diagonal() returns a copy and can be scribbled on without affecting the original array: # 1.6: >>> a = np.ones((2, 2)) >>> d = a.diagonal() >>> d[0] = 3 >>> a array([[ 1., 1.], [ 1., 1.]]) # current master/1.7: >>> a = np.ones((2, 2)) >>> d = a.diagonal() >>> d[0] = 3 >>> a array([[ 3., 1.], [ 1., 1.]]) This is dangerous, obviously, and tricky to handle, since there's no clear way to detect it and give a DeprecationWarning. One option might be to keep the new behavior, but mark the returned view as not WRITEABLE, and then flip to WRITEABLE=True in 1.8. Going from read-only to writeable would be a compatible change, so that way we end up on the behaviour we want eventually (in 1.8), and have only one backwards compatibility break (1.6 -> 1.7), but that break is clean and obvious. -- Nathaniel From nouiz at nouiz.org Fri May 11 11:54:29 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 11 May 2012 11:54:29 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: In Theano we use a view, but that is not relevant as it is the compiler that tell what is inplace. So this is invisible to the user. What about a parameter to diagonal() that default to return a view not writable as you said. The user can then choose what it want and this don't break the inferface. You suggest that in 1.7 we return a view that is not writable, but tell this is an interface change. I don't see an interface change here as this change nothing for the user(if the writable flag is respected. Can we rely on this?). The user will need to update their code in 1.8 if we return a writable view. So I think the interface change is in 1.8. Why not change the interface in numpy 2.0? Otherwise, we have a high risk of people just updating numpy without checking the release note and have bad result. The parameter to diagonal will allow people to have a view. Fred On Fri, May 11, 2012 at 11:42 AM, Nathaniel Smith wrote: > I've been trying to sort through the changes that landed in master > from the missingdata branch to figure out how to separate out changes > related to NA support from those that aren't, and noticed that one of > them should probably be flagged to the list. Traditionally, > arr.diagonal() and np.diagonal(arr) return a *copy* of the diagonal. > Now, they return a view onto the original array. > > On the one hand, this seems like it's clearly the way it should have > been since the beginning -- I'd expect .diagonal() to be a cheap > operation, like .transpose() and .reshape(). But, it's a potential > compatibility break if there is code out there that assumes diagonal() > returns a copy and can be scribbled on without affecting the original > array: > > # 1.6: >>>> a = np.ones((2, 2)) >>>> d = a.diagonal() >>>> d[0] = 3 >>>> a > array([[ 1., ?1.], > ? ? ? [ 1., ?1.]]) > > # current master/1.7: >>>> a = np.ones((2, 2)) >>>> d = a.diagonal() >>>> d[0] = 3 >>>> a > array([[ 3., ?1.], > ? ? ? [ 1., ?1.]]) > > This is dangerous, obviously, and tricky to handle, since there's no > clear way to detect it and give a DeprecationWarning. > > One option might be to keep the new behavior, but mark the returned > view as not WRITEABLE, and then flip to WRITEABLE=True in 1.8. Going > from read-only to writeable would be a compatible change, so that way > we end up on the behaviour we want eventually (in 1.8), and have only > one backwards compatibility break (1.6 -> 1.7), but that break is > clean and obvious. > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Fri May 11 12:00:06 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 11 May 2012 10:00:06 -0600 Subject: [Numpy-discussion] stable sort for structured dtypes? In-Reply-To: References: Message-ID: On Fri, May 11, 2012 at 9:01 AM, Benjamin Root wrote: > Hello all, > > I need to sort a structured array in a stable manner. I am also sorting > only by one of the keys, so I don't think lexsort() is stable in that > respect. np.sort() allows for choosing 'mergesort', but it appears to not > be implemented for structured arrays. Am I going to have to create a new > plain array out of the one column I want to sort by, and run np.artsort() > with the mergesort in order to get around this? Or is there something more > straightforward that I am missing? > Lexsort is just a sequence of indirect merge sorts, so using it to sort on a single column is the same as calling argsort(..., kind='mergesort'). Mergesort (and heapsort) need to be extended to object arrays and arrays with specified comparison functions. I think that would be an interesting project for someone, I've been intending to do it myself but haven't got around to it. But as to your current problem, you probably need to have the keys in a plain old array. They also need to be in a contiguous array, but the sort methods take care of that by making contiguous copies when needed. Adding a step parameter to the sorts is another small project for someone. There is an interesting trade off there involving cache vs copy time vs memory usage. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri May 11 12:36:42 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 11 May 2012 12:36:42 -0400 Subject: [Numpy-discussion] stable sort for structured dtypes? In-Reply-To: References: Message-ID: On Fri, May 11, 2012 at 12:00 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Fri, May 11, 2012 at 9:01 AM, Benjamin Root wrote: > >> Hello all, >> >> I need to sort a structured array in a stable manner. I am also sorting >> only by one of the keys, so I don't think lexsort() is stable in that >> respect. np.sort() allows for choosing 'mergesort', but it appears to not >> be implemented for structured arrays. Am I going to have to create a new >> plain array out of the one column I want to sort by, and run np.artsort() >> with the mergesort in order to get around this? Or is there something more >> straightforward that I am missing? >> > > Lexsort is just a sequence of indirect merge sorts, so using it to sort on > a single column is the same as calling argsort(..., kind='mergesort'). > > Mergesort (and heapsort) need to be extended to object arrays and arrays > with specified comparison functions. I think that would be an interesting > project for someone, I've been intending to do it myself but haven't got > around to it. > > But as to your current problem, you probably need to have the keys in a > plain old array. They also need to be in a contiguous array, but the sort > methods take care of that by making contiguous copies when needed. Adding a > step parameter to the sorts is another small project for someone. There is > an interesting trade off there involving cache vs copy time vs memory usage. > > Chuck > > Ok, that clears it up for me. I ended up just doing an argsort(np.array(d['vtime']), kind=...) and use the indices as a guide. My purpose didn't require a resorted array anyway, so this will do for now. Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From n.becker at amolf.nl Fri May 11 12:46:12 2012 From: n.becker at amolf.nl (Nils Becker) Date: Fri, 11 May 2012 18:46:12 +0200 Subject: [Numpy-discussion] histogram2d and histogramdd return counts as floats while histogram returns ints Message-ID: <4FAD4254.50903@amolf.nl> hi, is this intended? np.histogramdd([[1,2],[3,4]],bins=2) (array([[ 1., 0.], [ 0., 1.]]), [array([ 1. , 1.5, 2. ]), array([ 3. , 3.5, 4. ])]) np.histogram2d([1,2],[3,4],bins=2) (array([[ 1., 0.], [ 0., 1.]]), array([ 1. , 1.5, 2. ]), array([ 3. , 3.5, 4. ])) np.histogram([1,2],bins=2) (array([1, 1]), array([ 1. , 1.5, 2. ])) n. From pav at iki.fi Fri May 11 15:18:14 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 11 May 2012 21:18:14 +0200 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: 11.05.2012 17:54, Fr?d?ric Bastien kirjoitti: > In Theano we use a view, but that is not relevant as it is the > compiler that tell what is inplace. So this is invisible to the user. > > What about a parameter to diagonal() that default to return a view not > writable as you said. The user can then choose what it want and this > don't break the inferface. [clip] Agreed, it seems this is the appropriate way to go on here `diagonal(copy=True)`. A more obscure alternative would be to add a separate method that returns a view. I don't think changing the default behavior in a later release is a good idea. It's a sort of an API wart, but IMHO better that than subtle code breakage. Pauli From travis at continuum.io Fri May 11 15:47:37 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 11 May 2012 14:47:37 -0500 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: <78756A10-6DFE-42B4-A649-62C3B52CA1BC@continuum.io> On May 11, 2012, at 2:18 PM, Pauli Virtanen wrote: > 11.05.2012 17:54, Fr?d?ric Bastien kirjoitti: >> In Theano we use a view, but that is not relevant as it is the >> compiler that tell what is inplace. So this is invisible to the user. >> >> What about a parameter to diagonal() that default to return a view not >> writable as you said. The user can then choose what it want and this >> don't break the inferface. > [clip] > > Agreed, it seems this is the appropriate way to go on here > `diagonal(copy=True)`. A more obscure alternative would be to add a > separate method that returns a view. > > I don't think changing the default behavior in a later release is a good > idea. It's a sort of an API wart, but IMHO better that than subtle code > breakage. Yes, I think this is true. We should add the copy keyword, but not change the API. -Travis > > Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mwwiebe at gmail.com Fri May 11 16:10:51 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 11 May 2012 15:10:51 -0500 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FACF458.6070200@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> Message-ID: On Fri, May 11, 2012 at 6:13 AM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > (NumPy devs: I know, I get too many ideas. But this time I *really* > believe in it, I think this is going to be *huge*. And if Mark F. likes > it it's not going to be without manpower; and as his mentor I'd pitch in > too here and there.) > > (Mark F.: I believe this is *very* relevant to your GSoC. I certainly > don't want to micro-manage your GSoC, just have your take.) > > Travis, thank you very much for those good words in the "NA-mask > interactions..." thread. It put most of my concerns away. If anybody is > leaning towards for opaqueness because of its OOP purity, I want to > refer to C++ and its walled-garden of ideological purity -- it has, > what, 3-4 different OOP array libraries, neither of which is able to > out-compete the other. Meanwhile the rest of the world happily > cooperates using pointers, strides, CSR and CSC. > > Now, there are limits to what you can do with strides and pointers. > Noone's denying the need for more. In my mind that's an API where you > can do fetch_block and put_block of cache-sized, N-dimensional blocks on > an array; but it might be something slightly different. > > Here's what I'm asking: DO NOT simply keep extending ndarray and the > NumPy C API to deal with this issue. > > What we need is duck-typing/polymorphism at the C level. If you keep > extending ndarray and the NumPy C API, what we'll have is a one-to-many > relationship: One provider of array technology, multiple consumers (with > hooks, I'm sure, but all implementations of the hook concept in the > NumPy world I've seen so far are a total disaster!). > There is similar intent behind an idea I raised last summer here: http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html I stopped doing anything on it when considering the scope of it with the linear algebra functions included like Pauli suggested. Nathaniel did an experimental implementation of some parts of the idea in Python here: https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py#L107 > What I think we need instead is something like PEP 3118 for the > "abstract" array that is only available block-wise with getters and > setters. On the Cython list we've decided that what we want for CEP 1000 > (for boxing callbacks etc.) is to extend PyTypeObject with our own > fields; we could create CEP 1001 to solve this issue and make any Python > object an exporter of "block-getter/setter-arrays" (better name needed). > > What would be exported is (of course) a simple vtable: > > typedef struct { > int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t > *lower_right, ...); > ... > } block_getter_setter_array_vtable; > > Let's please discuss the details *after* the fundamentals. But the > reason I put void* there instead of PyObject* is that I hope this could > be used beyond the Python world (say, Python<->Julia); the void* would > be handed to you at the time you receive the vtable (however we handle > that). > > I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you > could embed the block-transposition that's needed for efficient "arr + > arr.T" at this level. > > Imagine being able to do this in Cython: > > a[...] = b + c * d > > and have that essentially compile to the numexpr blocked approach, *but* > where b, c, and d can have whatever type that exports CEP 1001? So c > could be a "diagonal" array which uses O(n) storage to export O(n^2) > elements, for instance, and the unrolled Cython code never needs to know. > > As far as NumPy goes, something along these lines should hopefully mean > that new C code being written doesn't rely so much on what exactly goes > into "ndarray" and what goes into other classes; so that we don't get > the same problem again that we do now with code that doesn't use PEP 3118. > This general idea is very good. I think PEP 3118 captures a lot of the essence of the ndarray, but there's a lot of potential generality that it doesn't handle, such as the "diagonal" array or pluggable dtypes. -Mark > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri May 11 16:12:23 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 11 May 2012 15:12:23 -0500 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Fri, May 11, 2012 at 2:18 PM, Pauli Virtanen wrote: > 11.05.2012 17:54, Fr?d?ric Bastien kirjoitti: > > In Theano we use a view, but that is not relevant as it is the > > compiler that tell what is inplace. So this is invisible to the user. > > > > What about a parameter to diagonal() that default to return a view not > > writable as you said. The user can then choose what it want and this > > don't break the inferface. > [clip] > > Agreed, it seems this is the appropriate way to go on here > `diagonal(copy=True)`. A more obscure alternative would be to add a > separate method that returns a view. > This looks like the best way to deal with it, yes. Cheers, Mark > > I don't think changing the default behavior in a later release is a good > idea. It's a sort of an API wart, but IMHO better that than subtle code > breakage. > > Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri May 11 16:17:55 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 11 May 2012 15:17:55 -0500 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> Message-ID: On Fri, May 11, 2012 at 8:37 AM, mark florisson wrote: > On 11 May 2012 12:13, Dag Sverre Seljebotn > wrote: > > (NumPy devs: I know, I get too many ideas. But this time I *really* > believe > > in it, I think this is going to be *huge*. And if Mark F. likes it it's > not > > going to be without manpower; and as his mentor I'd pitch in too here and > > there.) > > > > (Mark F.: I believe this is *very* relevant to your GSoC. I certainly > don't > > want to micro-manage your GSoC, just have your take.) > > > > Travis, thank you very much for those good words in the "NA-mask > > interactions..." thread. It put most of my concerns away. If anybody is > > leaning towards for opaqueness because of its OOP purity, I want to > refer to > > C++ and its walled-garden of ideological purity -- it has, what, 3-4 > > different OOP array libraries, neither of which is able to out-compete > the > > other. Meanwhile the rest of the world happily cooperates using pointers, > > strides, CSR and CSC. > > > > Now, there are limits to what you can do with strides and pointers. > Noone's > > denying the need for more. In my mind that's an API where you can do > > fetch_block and put_block of cache-sized, N-dimensional blocks on an > array; > > but it might be something slightly different. > > > > Here's what I'm asking: DO NOT simply keep extending ndarray and the > NumPy C > > API to deal with this issue. > > > > What we need is duck-typing/polymorphism at the C level. If you keep > > extending ndarray and the NumPy C API, what we'll have is a one-to-many > > relationship: One provider of array technology, multiple consumers (with > > hooks, I'm sure, but all implementations of the hook concept in the NumPy > > world I've seen so far are a total disaster!). > > > > What I think we need instead is something like PEP 3118 for the > "abstract" > > array that is only available block-wise with getters and setters. On the > > Cython list we've decided that what we want for CEP 1000 (for boxing > > callbacks etc.) is to extend PyTypeObject with our own fields; we could > > create CEP 1001 to solve this issue and make any Python object an > exporter > > of "block-getter/setter-arrays" (better name needed). > > > > What would be exported is (of course) a simple vtable: > > > > typedef struct { > > int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, > > ...); > > ... > > } block_getter_setter_array_vtable; > > > > Let's please discuss the details *after* the fundamentals. But the > reason I > > put void* there instead of PyObject* is that I hope this could be used > > beyond the Python world (say, Python<->Julia); the void* would be handed > to > > you at the time you receive the vtable (however we handle that). > > I suppose it would also be useful to have some way of predicting the > output format polymorphically for the caller. E.g. dense * > block_diagonal results in block diagonal, but dense + block_diagonal > results in dense, etc. It might be useful for the caller to know > whether it needs to allocate a sparse, dense or block-structured > array. Or maybe the polymorphic function could even do the allocation. > This needs to happen recursively of course, to avoid intermediate > temporaries. The compiler could easily handle that, and so could numpy > when it gets lazy evaluation. > > I think if the heavy lifting of allocating output arrays and exporting > these arrays work in numpy, then support in Cython could use that (I > can already hear certain people object to more complicated array stuff > in Cython :). Even better here would be an external project that each > our projects could use (I still think the nditer sorting functionality > of arrays should be numpy-agnostic and externally available). > It might be nice to expose something which gives an nditer-style looping primitive through the CEP 1001 mechanism. I could imagine a pure C version of this and an LLVM bitcode version which could inline into numba or other LLVM producing systems. -Mark > > > I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you > > could embed the block-transposition that's needed for efficient "arr + > > arr.T" at this level. > > > > Imagine being able to do this in Cython: > > > > a[...] = b + c * d > > > > and have that essentially compile to the numexpr blocked approach, *but* > > where b, c, and d can have whatever type that exports CEP 1001? So c > could > > be a "diagonal" array which uses O(n) storage to export O(n^2) elements, > for > > instance, and the unrolled Cython code never needs to know. > > > > As far as NumPy goes, something along these lines should hopefully mean > that > > new C code being written doesn't rely so much on what exactly goes into > > "ndarray" and what goes into other classes; so that we don't get the same > > problem again that we do now with code that doesn't use PEP 3118. > > > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri May 11 16:25:55 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 11 May 2012 15:25:55 -0500 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: <4FAAC918.8050705@astro.uio.no> Message-ID: On Thu, May 10, 2012 at 10:28 PM, Matthew Brett wrote: > Hi, > > On Thu, May 10, 2012 at 2:43 AM, Nathaniel Smith wrote: > > Hi Matthew, > > > > On Thu, May 10, 2012 at 12:01 AM, Matthew Brett > wrote: > >>> The third proposal is certainly the best one from Cython's perspective; > >>> and I imagine for those writing C extensions against the C API too. > >>> Having PyType_Check fail for ndmasked is a very good way of having code > >>> fail that is not written to take masks into account. > >> > >> Mark, Nathaniel - can you comment how your chosen approaches would > >> interact with extension code? > >> > >> I'm guessing the bitpattern dtypes would be expected to cause > >> extension code to choke if the type is not supported? > > > > That's pretty much how I'm imagining it, yes. Right now if you have, > > say, a Cython function like > > > > cdef f(np.ndarray[double] a): > > ... > > > > and you do f(np.zeros(10, dtype=int)), then it will error out, because > > that function doesn't know how to handle ints, only doubles. The same > > would apply for, say, a NA-enabled integer. In general there are > > almost arbitrarily many dtypes that could get passed into any function > > (including user-defined ones, etc.), so C code already has to check > > dtypes for correctness. > > > > Second order issues: > > - There is certainly C code out there that just assumes that it will > > only be passed an array with certain dtype (and ndim, memory layout, > > etc...). If you write such C code then it's your job to make sure that > > you only pass it the kinds of arrays that it expects, just like now > > :-). > > > > - We may want to do some sort of special-casing of handling for > > floating point NA dtypes that use an NaN as the "magic" bitpattern, > > since many algorithms *will* work with these unchanged, and it might > > be frustrating to have to wait for every extension module to be > > updated just to allow for this case explicitly before using them. OTOH > > you can easily work around this. Like say my_qr is a legacy C function > > that will in fact propagate NaNs correctly, so float NA dtypes would > > Just Work -- except, it errors out at the start because it doesn't > > recognize the dtype. How annoying. We *could* have some special hack > > you can use to force it to work anyway (by like making the "is this > > the dtype I expect?" routine lie.) But you can also just do: > > > > def my_qr_wrapper(arr): > > if arr.dtype is a NA float dtype with NaN magic value: > > result = my_qr(arr.view(arr.dtype.base_dtype)) > > return result.view(arr.dtype) > > else: > > return my_qr(arr) > > > > and hey presto, now it will correctly pass through NAs. So perhaps > > it's not worth bothering with special hacks. > > > > - Of course if your extension function does want to handle NAs > > generically, then there will be a simple C api for checking for them, > > setting them, etc. Numpy needs such an API internally anyway! > > Thanks for this. > > Mark - in view of the discussions about Cython and extension code - > could you say what you see as disadvantages to the ndmasked subclass > proposal? > The biggest difficulty looks to me like how to work with both of them reasonably from the C API. The idea of ndarray and ndmasked having different independent TypeObjects, but still working through the same API calls feels a little disconcerting. Maybe this is a reasonable compromise, though, it would be nice to see the idea fleshed out a bit more with some examples of how the code would work from the C level. Cheers, Mark > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjhnson at gmail.com Fri May 11 16:26:04 2012 From: tjhnson at gmail.com (T J) Date: Fri, 11 May 2012 13:26:04 -0700 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Fri, May 11, 2012 at 1:12 PM, Mark Wiebe wrote: > On Fri, May 11, 2012 at 2:18 PM, Pauli Virtanen wrote: > >> 11.05.2012 17:54, Fr?d?ric Bastien kirjoitti: >> > In Theano we use a view, but that is not relevant as it is the >> > compiler that tell what is inplace. So this is invisible to the user. >> > >> > What about a parameter to diagonal() that default to return a view not >> > writable as you said. The user can then choose what it want and this >> > don't break the inferface. >> [clip] >> >> Agreed, it seems this is the appropriate way to go on here >> `diagonal(copy=True)`. A more obscure alternative would be to add a >> separate method that returns a view. >> > > This looks like the best way to deal with it, yes. > > Cheers, > Mark > > >> >> I don't think changing the default behavior in a later release is a good >> idea. It's a sort of an API wart, but IMHO better that than subtle code >> breakage. >> >> >> copy=True seems fine, but is this the final plan? What about long term, should diag() eventually be brought in line with transpose() and reshape() so that it is a view by default? Changing default behavior is certainly not something that should be done all the time, but it *can* be done if deprecated appropriately. A more consistent API is better than one with warts (if this particular issue is actually seen as a wart). -------------- next part -------------- An HTML attachment was scrubbed... URL: From normanshelley at yahoo.com Fri May 11 18:01:26 2012 From: normanshelley at yahoo.com (Norman Shelley) Date: Fri, 11 May 2012 15:01:26 -0700 (PDT) Subject: [Numpy-discussion] Issue with numpy.random.multivariate_normal Linux RHEL4 np version: 1.6.1 Message-ID: <1336773686.91896.YahooMailClassic@web161606.mail.bf1.yahoo.com> Running on Linux RHEL4 numpy.random.multivariate_normal seems to work well for 25 mean values but when I go to 26 mean values (and their corresponding covariance values) it produces garbage. Any ideas? $ python2.7 good.py np version: 1.6.1 determinate(covariances): 42854852.1649 0 [? 3.45322570e-02?? 2.96621269e+00?? 4.03121985e-01?? 2.23905867e+00 ?? 8.95234686e+02?? 2.51444274e-01?? 1.95929594e+00?? 4.14068151e-01 ?? 1.24467318e+02?? 8.74062462e+01?? 8.47893386e+02?? 5.58966531e+01 ?? 1.04803128e+03?? 1.10222739e+03?? 9.31516255e-01?? 9.80601657e+01 ?? 1.04971389e+03?? 4.81104935e+01?? 1.04972224e+03?? 1.44024490e+01 ?? 1.09776504e+03?? 1.02173823e+00?? 8.98081303e+02?? 5.01189359e+15 ?? 9.97924350e+02] 1 [? 3.52017795e-02?? 2.89531850e+00?? 4.03060783e-01?? 2.18453650e+00 ?? 8.99337485e+02?? 2.32277972e-01?? 1.93606615e+00?? 6.14486545e-01 ?? 1.09725829e+02?? 7.79127704e+01?? 8.48863835e+02?? 5.70064466e+01 ?? 1.04712290e+03?? 1.09834919e+03?? 9.28635008e-01?? 9.56070004e+01 ?? 1.05169272e+03?? 5.00364617e+01?? 1.04814358e+03?? 1.55657866e+01 ?? 1.09711078e+03?? 9.87177142e-01?? 9.00413630e+02?? 4.93629091e+15 ?? 1.00100062e+03] $ python2.7? bad.py 1.6.1 determinate(covariances): 4285.48521649 0 [ -4.01024135e+05?? 1.95103283e+05? -4.72636041e+05? -2.34189276e+05 ?? 6.00475453e+05?? 4.29933967e+05? -6.58712410e+05?? 1.45095592e+05 ?? 1.35199771e+05? -3.78301725e+05? -1.77898239e+05? -4.78687011e+04 ?? 4.52627371e+05?? 7.28941427e+05?? 1.53255198e+05? -1.78141398e+05 ? -1.10133033e+05?? 2.48546775e+05?? 2.50429017e+05? -1.05514181e+04 ?? 8.46088978e+05? -1.77665193e+05? -3.42631589e+05?? 5.02218802e+15 ? -4.50126852e+05? -4.72033829e+05] 1 [ -3.25327274e+04? -2.80841748e+05?? 1.09817858e+05? -6.51376666e+05 ?? 2.82390935e+05? -3.53187271e+04? -2.33180214e+04?? 5.38896672e+05 ?? 1.23272353e+06? -1.07704663e+06? -5.67714536e+05?? 7.97030970e+05 ? -3.69409577e+05?? 6.78682877e+04?? 1.66741123e+05? -1.09666730e+05 ? -6.81887014e+05? -2.30865221e+05? -2.22374827e+03? -2.49963173e+05 ? -6.45069676e+05? -2.77671168e+05?? 3.74262895e+05?? 5.05584505e+15 ?? 3.56182413e+05? -7.23940183e+05] good.py import numpy as np print 'np version:', np.__version__ means=[0.035, 2.9, 0.4, 2.2, 900, 0.25, 2.0, 0.5, 120, 84, 850, 57.5, 1050, 1100, 0.9, 100, 1050, 50, 1050, 15, 1100, 1.0, 900, 5000000000000000.0, 1000] covariances=[[2.209e-07, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0008410000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 2.9160000000000002e-05, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.00048399999999999995, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 6.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0064, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0225, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 100, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 9, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.8899999999999997, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7556, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0009, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7556, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0001, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.5e+27, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999]] print 'determinate(covariances):', np.linalg.det(covariances) data = np.random.multivariate_normal(means, covariances, 2) for i, row in enumerate(data): ??? print i, row bad.py import numpy as np print np.__version__ means=[0.035, 2.9, 0.4, 2.2, 900, 0.25, 2.0, 0.5, 120, 84, 850, 57.5, 1050, 1100, 0.9, 100, 1050, 50, 1050, 15, 1100, 1.0, 900, 5000000000000000.0, 1000, 0.215] covariances=[[2.209e-07, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0008410000000000001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 2.9160000000000002e-05, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.00048399999999999995, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 6.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0001, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0064, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0225, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 100, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 9, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.8899999999999997, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7556, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0009, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0, 0, 0.0, 0.0, 0.0, 0.0, 0.0, 9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7556, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0001, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.5e+27, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.889999999999999, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0001]] print 'determinate(covariances):', np.linalg.det(covariances) data = np.random.multivariate_normal(means, covariances, 2) for i, row in enumerate(data): ??? print i, row -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas_unterthiner at web.de Fri May 11 18:54:47 2012 From: thomas_unterthiner at web.de (Thomas Unterthiner) Date: Sat, 12 May 2012 00:54:47 +0200 Subject: [Numpy-discussion] Problems when using ACML with numpy Message-ID: <4FAD98B7.7070101@web.de> Hi there! I'm having troubles getting numpy to work with ACML. I'm running Ubuntu 12.04 on an x86-64 system and use acml 5.1.0. On my first try, I installed numpy/scipy from the official ubuntu repository, then just changed the symlink to the blas/lapack libraries of my system to use acml. i.e. I did: ln -s /usr/lib/liblapack.so.3gf /opt/acml5.1.0/gfortran64_fma4/lib/libacml.so ln -s /usr/lib/libblas.so.3gf /opt/acml5.1.0/gfortran64_fma4/lib/libacml.so This method worked perfectly fine for Octave. However with numpy the following code will always run into what seems to be an endless loop: import numpy as np import time a = np.random.randn(5000, 5000) t0 = time.clock() b = np.dot(a, a) t1 = time.clock() print t1 - t0 The process will have 100% CPU usage and will not show any activity under strace. A gdb backtrace looks as follows: (gdb) bt #0 0x00007fdcc000e524 in ?? () from /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so #1 0x00007fdcc008bcb9 in ?? () from /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so #2 0x00007fdcc00a304d in ?? () from /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so #3 0x000000000042a485 in PyEval_EvalFrameEx () #4 0x00000000004317f2 in PyEval_EvalCodeEx () #5 0x000000000054bd50 in PyRun_InteractiveOneFlags () #6 0x000000000054c045 in PyRun_InteractiveLoopFlags () #7 0x000000000054ce9f in Py_Main () #8 0x00007fdcc0e7976d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 Curiously enough, when changing the matrix-dimensions from 5000x5000 to 500x500, all is good and the code executes in 0.32 seconds. (As a reference: the 5000x5000 matrix multiply takes 65 seconds in octave with ATLAS and 10 seconds with ACML, so the problem is not that the matrix-multiply just takes too long). I then tried compiling numpy-1.6.1 myself, doing: tar xzf numpy-1.6.1.tar.gz cd numpy-1.6.1/ export CFLAGS="-O3 -march=native" export CXXFLAGS="-O3 -march=native" export FFLAGS="-O3 -march=native" export FCFLAGS="-O3 -march=native" export LDFLAGS="-O3" export BLAS=/opt/acml5.1.0/gfortran64_fma4/lib/libacml.so export LAPACK=/opt/acml5.1.0/gfortran64_fma4/lib/libacml.so export ATLAS=None python setup.py build This worked (apart from a missing '-shared' flag when linking lapack_lite.so, which I then had to link by hand by adding the flag), however the error persisted. The same with numpy-1.6.2rc1. Fromt here I don't know how to proceed. Any help would be greatly appreciated :) Cheers Thomas From njs at pobox.com Fri May 11 19:39:26 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 12 May 2012 00:39:26 +0100 Subject: [Numpy-discussion] Masking through generator arrays In-Reply-To: References: <4FAB3BE3.1030405@astro.uio.no> <4FAB69EA.9070306@astro.uio.no> <4FAB8C8B.9090107@astro.uio.no> Message-ID: On Thu, May 10, 2012 at 7:23 PM, Chris Barker wrote: > That is one of my concerns about the "bit pattern" idea -- we've then > created a new binary type that no other standard software understands > -- that looks like a a lot of work to me to deal with, or even worse, > ripe for weird, non-obvious errors in code that access that good-old > char*. Numpy supports a number of unusual binary data types, e.g. halfs and datetimes, that aren't well supported by other standard software. As Travis points out, no-one forces you to use them :-). > So I'm happier with a mask implementation -- more memory, yes, but it > seems more robust an easy to deal with with outside code. Let's say we have a no-frills C function that we want to call, and it's defined to use a mask: void do_calcs(double * data, char * mask, int size); To call this function from Cython, then in the mask NAs world we do something like: a = np.ascontiguousarray(a) do_calcs(PyArray_DATA(a), PyArray_MASK(a), a.size) OTOH in the bitpattern NA world, we do something like: a = np.ascontiguousarray(a) mask = np.isNA(a) do_calcs(PyArray_DATA(a), PyArray_DATA(mask), a.size) Of course there are various extra complexities that can come in here depending on what you want to do if there are no NAs possible, whether do_calcs can take a NULL mask pointer, if you're writing in C instead of Cython then you need to use the C equivalent functions, etc. But IMHO there's no fundamental reason why bitpatterns have to be much more complex to deal with in outside code than masks, assuming a properly helpful API. What can't be papered over at the API level are the questions like, do you want to be able to "un-assign" NA to reveal what used to be there before? That needs masks, for better or worse. But I may well be missing something... does that address your concern, or is there more to it? -- Nathaniel From njs at pobox.com Fri May 11 19:54:11 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 12 May 2012 00:54:11 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Fri, May 11, 2012 at 9:26 PM, T J wrote: > On Fri, May 11, 2012 at 1:12 PM, Mark Wiebe wrote: >> >> On Fri, May 11, 2012 at 2:18 PM, Pauli Virtanen wrote: >>> >>> 11.05.2012 17:54, Fr?d?ric Bastien kirjoitti: >>> > In Theano we use a view, but that is not relevant as it is the >>> > compiler that tell what is inplace. So this is invisible to the user. >>> > >>> > What about a parameter to diagonal() that default to return a view not >>> > writable as you said. The user can then choose what it want and this >>> > don't break the inferface. >>> [clip] >>> >>> Agreed, it seems this is the appropriate way to go on here >>> `diagonal(copy=True)`. A more obscure alternative would be to add a >>> separate method that returns a view. >> >> >> This looks like the best way to deal with it, yes. >> >> Cheers, >> Mark >> >>> >>> >>> I don't think changing the default behavior in a later release is a good >>> idea. It's a sort of an API wart, but IMHO better that than subtle code >>> breakage. >>> >>> > > copy=True seems fine, but is this the final plan? ? What about long term, > should diag() eventually be brought in line with transpose() and reshape() > so that it is a view by default? ?Changing default behavior is certainly not > something that should be done all the time, but it *can* be done if > deprecated appropriately. ?A more consistent API is better than one with > warts (if this particular issue is actually seen as a wart). This is my question as well. Adding copy=True default argument is certainly a fine solution for 1.7, but IMHO in the long run it would be better for diagonal() to return a view by default. (Aside from it seeming generally more "numpythonic", I see from auditing the code I have lying around my homedir that it would generally be a free speed win, and having to remember to type a.diagonal(copy=False) all the time in order to get full performance seems a bit annoying.) I mean, I'm all for conservatism in these things, which is why I raised the issue in the first place :-). But it also seems like there should be *some* mechanism for getting there from here (assuming others agree we want to). There's been grumblings about trying to do more evolving of numpy in-place instead of putting everything off to the legendary 2.0, right? Unfortunately just putting a deprecation warning on everyone who calls diagonal() without an explicit copy= argument seems like it would be *really* obnoxious, though. If necessary we could get more creative... add a special-purpose ndarray flag like WARN_ON_WRITE so that code which writes to the returned array continues to work like now, but also triggers a deprecation warning? I dunno. -- Nathaniel From jdgleeson at mac.com Sat May 12 03:05:00 2012 From: jdgleeson at mac.com (John Gleeson) Date: Sat, 12 May 2012 01:05:00 -0600 Subject: [Numpy-discussion] Issue with numpy.random.multivariate_normal Linux RHEL4 np version: 1.6.1 In-Reply-To: <1336773686.91896.YahooMailClassic@web161606.mail.bf1.yahoo.com> References: <1336773686.91896.YahooMailClassic@web161606.mail.bf1.yahoo.com> Message-ID: On 2012-05-11, at 4:01 PM, Norman Shelley wrote: > Running on Linux RHEL4 > > numpy.random.multivariate_normal seems to work well for 25 mean > values but when I go to 26 mean values (and their corresponding > covariance values) it produces garbage. > Any ideas? The implementation of multivariate_normal in numpy uses Singular Value Decomposition, and your covariance matrices are very ill-conditioned. That is true, however, for both the good case and the bad case, so it really doesn't explain why the SVD method suddenly fails when the dimension changes from 25 to 26. I hope to look into this when I get more time. I created a new version of multivariate_normal (see below) that has a method parameter. Setting method='chol' chooses Cholesky decomposition instead of SVD. It gives nice results for your size 26 case. data = multivariate_normal(means, covariances, 2, method='chol') print 'using chol:' for i, row in enumerate(data): print i, row def multivariate_normal(mean, cov, size=None, method='svd'): mean = np.array(mean) cov = np.array(cov) if size is None: shape = [] else: shape = size if len(mean.shape) != 1: raise ValueError("mean must be 1 dimensional") if (len(cov.shape) != 2) or (cov.shape[0] != cov.shape[1]): raise ValueError("cov must be 2 dimensional and square") if mean.shape[0] != cov.shape[0]: raise ValueError("mean and cov must have same length") # Compute shape of output if isinstance(shape, int): shape = [shape] final_shape = list(shape[:]) final_shape.append(mean.shape[0]) # Create a matrix of independent standard normally distributed random # numbers. The matrix has rows with the same length as mean and as # many rows are necessary to form a matrix of shape final_shape. x = np.random.standard_normal(np.multiply.reduce(final_shape)) x.shape = (np.multiply.reduce(final_shape[0:len(final_shape)-1]), mean.shape[0]) # Transform matrix of standard normals into matrix where each row # contains multivariate normals with the desired covariance. # Compute A such that dot(transpose(A),A) == cov. # Then the matrix products of the rows of x and A has the desired # covariance. if method == 'svd': (u,s,v) = np.linalg.svd(cov) x = np.dot(x*np.sqrt(s),v) elif method == 'chol': L = np.linalg.cholesky(cov) x = np.dot(x,L) else: raise ValueError("method must be 'svd' or 'chol'") # The rows of x now have the correct covariance but mean 0. Add # mean to each row. Then each row will have mean mean. np.add(mean,x,x) x.shape = tuple(final_shape) return x From pav at iki.fi Sat May 12 07:19:09 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 12 May 2012 13:19:09 +0200 Subject: [Numpy-discussion] Problems when using ACML with numpy In-Reply-To: <4FAD98B7.7070101@web.de> References: <4FAD98B7.7070101@web.de> Message-ID: 12.05.2012 00:54, Thomas Unterthiner kirjoitti: [clip] > The process will have 100% CPU usage and will not show any activity > under strace. A gdb backtrace looks as follows: > > (gdb) bt > #0 0x00007fdcc000e524 in ?? () > from /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so [clip] The backtrace looks like it does not use ACML. Does from numpy.core._dotblas import dot work? -- Pauli Virtanen From ralf.gommers at googlemail.com Sat May 12 09:31:28 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 12 May 2012 15:31:28 +0200 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Sat, May 12, 2012 at 1:54 AM, Nathaniel Smith wrote: > On Fri, May 11, 2012 at 9:26 PM, T J wrote: > > On Fri, May 11, 2012 at 1:12 PM, Mark Wiebe wrote: > >> > >> On Fri, May 11, 2012 at 2:18 PM, Pauli Virtanen wrote: > >>> > >>> 11.05.2012 17:54, Fr?d?ric Bastien kirjoitti: > >>> > In Theano we use a view, but that is not relevant as it is the > >>> > compiler that tell what is inplace. So this is invisible to the user. > >>> > > >>> > What about a parameter to diagonal() that default to return a view > not > >>> > writable as you said. The user can then choose what it want and this > >>> > don't break the inferface. > >>> [clip] > >>> > >>> Agreed, it seems this is the appropriate way to go on here > >>> `diagonal(copy=True)`. A more obscure alternative would be to add a > >>> separate method that returns a view. > >> > >> > >> This looks like the best way to deal with it, yes. > >> > >> Cheers, > >> Mark > >> > >>> > >>> > >>> I don't think changing the default behavior in a later release is a > good > >>> idea. It's a sort of an API wart, but IMHO better that than subtle code > >>> breakage. > >>> > >>> > > > > copy=True seems fine, but is this the final plan? What about long term, > > should diag() eventually be brought in line with transpose() and > reshape() > > so that it is a view by default? Changing default behavior is certainly > not > > something that should be done all the time, but it *can* be done if > > deprecated appropriately. A more consistent API is better than one with > > warts (if this particular issue is actually seen as a wart). > > This is my question as well. Adding copy=True default argument is > certainly a fine solution for 1.7, but IMHO in the long run it would > be better for diagonal() to return a view by default. If you want to get to a situation where the behavior is changed, adding a temporary new keyword is not a good solution in general. Because this forces you to make the change over 3 different releases: 1. introduce copy=True 2. change to copy=False 3. remove copy kw and step 3 is still breaking existing code, because people started using "copy=False". See the histogram new_behavior keyword as an example of this. (Aside from it > seeming generally more "numpythonic", I see from auditing the code I > have lying around my homedir that it would generally be a free speed > win, and having to remember to type a.diagonal(copy=False) all the > time in order to get full performance seems a bit annoying.) > > I mean, I'm all for conservatism in these things, which is why I > raised the issue in the first place :-). But it also seems like there > should be *some* mechanism for getting there from here (assuming > others agree we want to). There's been grumblings about trying to do > more evolving of numpy in-place instead of putting everything off to > the legendary 2.0, right? > > Unfortunately just putting a deprecation warning on everyone who calls > diagonal() without an explicit copy= argument seems like it would be > *really* obnoxious, though. If necessary we could get more creative... > add a special-purpose ndarray flag like WARN_ON_WRITE so that code > which writes to the returned array continues to work like now, but > also triggers a deprecation warning? I dunno. Something like this could be a solution. Otherwise, just living with the copy is imho much better than introducing a copy kw. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas_unterthiner at web.de Sat May 12 09:58:38 2012 From: thomas_unterthiner at web.de (Thomas Unterthiner) Date: Sat, 12 May 2012 15:58:38 +0200 Subject: [Numpy-discussion] Problems when using ACML with numpy In-Reply-To: References: Message-ID: <4FAE6C8E.7010501@web.de> On 05/12/2012 03:27 PM, numpy-discussion-request at scipy.org wrote: > 12.05.2012 00:54, Thomas Unterthiner kirjoitti: > [clip] >> > The process will have 100% CPU usage and will not show any activity >> > under strace. A gdb backtrace looks as follows: >> > >> > (gdb) bt >> > #0 0x00007fdcc000e524 in ?? () >> > from /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so > [clip] > > The backtrace looks like it does not use ACML. Does > > from numpy.core._dotblas import dot > > work? > Thanks for having a look at this. The following was tried with the numpy that comes from the Ubuntu repo and symlinked ACML: $ python Python 2.7.3 (default, Apr 20 2012, 22:39:59) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from numpy.core._dotblas import dot Traceback (most recent call last): File "", line 1, in ImportError: /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so: undefined symbol: cblas_cdotc_sub >>> Following up: $ ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so linux-vdso.so.1 => (0x00007fff3de00000) libblas.so.3gf => /usr/lib/libblas.so.3gf (0x00007f10965f8000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1096238000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1096030000) libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f1095d18000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1095a18000) /lib64/ld-linux-x86-64.so.2 (0x00007f1098a88000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f10957f8000) libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f10955c0000) $ ls -lh /usr/lib/libblas.so.3gf lrwxrwxrwx 1 root root 32 May 11 22:27 /usr/lib/libblas.so.3gf -> /etc/alternatives/libblas.so.3gf $ ls -lh /etc/alternatives/libblas.so.3gf lrwxrwxrwx 1 root root 45 May 11 22:36 /etc/alternatives/libblas.so.3gf -> /opt/acml5.1.0/gfortran64_fma4/lib/libacml.so Cheers From matthieu.brucher at gmail.com Sat May 12 10:00:51 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sat, 12 May 2012 16:00:51 +0200 Subject: [Numpy-discussion] Problems when using ACML with numpy In-Reply-To: <4FAE6C8E.7010501@web.de> References: <4FAE6C8E.7010501@web.de> Message-ID: Does ACML now provide a CBLAS interface? Matthieu 2012/5/12 Thomas Unterthiner > > On 05/12/2012 03:27 PM, numpy-discussion-request at scipy.org wrote: > > 12.05.2012 00:54, Thomas Unterthiner kirjoitti: > > [clip] > >> > The process will have 100% CPU usage and will not show any activity > >> > under strace. A gdb backtrace looks as follows: > >> > > >> > (gdb) bt > >> > #0 0x00007fdcc000e524 in ?? () > >> > from /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so > > [clip] > > > > The backtrace looks like it does not use ACML. Does > > > > from numpy.core._dotblas import dot > > > > work? > > > > Thanks for having a look at this. The following was tried with the > numpy that comes from the Ubuntu repo and symlinked ACML: > > > $ python > Python 2.7.3 (default, Apr 20 2012, 22:39:59) > [GCC 4.6.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from numpy.core._dotblas import dot > Traceback (most recent call last): > File "", line 1, in > ImportError: /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so: > undefined symbol: cblas_cdotc_sub > >>> > > > Following up: > > $ ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so > linux-vdso.so.1 => (0x00007fff3de00000) > libblas.so.3gf => /usr/lib/libblas.so.3gf (0x00007f10965f8000) > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1096238000) > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1096030000) > libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 > (0x00007f1095d18000) > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1095a18000) > /lib64/ld-linux-x86-64.so.2 (0x00007f1098a88000) > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 > (0x00007f10957f8000) > libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 > (0x00007f10955c0000) > $ ls -lh /usr/lib/libblas.so.3gf > lrwxrwxrwx 1 root root 32 May 11 22:27 /usr/lib/libblas.so.3gf -> > /etc/alternatives/libblas.so.3gf > $ ls -lh /etc/alternatives/libblas.so.3gf > lrwxrwxrwx 1 root root 45 May 11 22:36 /etc/alternatives/libblas.so.3gf > -> /opt/acml5.1.0/gfortran64_fma4/lib/libacml.so > > > > Cheers > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas_unterthiner at web.de Sat May 12 11:30:39 2012 From: thomas_unterthiner at web.de (Thomas Unterthiner) Date: Sat, 12 May 2012 17:30:39 +0200 Subject: [Numpy-discussion] Problems when using ACML with numpy In-Reply-To: References: <4FAE6C8E.7010501@web.de> Message-ID: <4FAE821F.6000509@web.de> On 05/12/2012 04:00 PM, Matthieu Brucher wrote: > Does ACML now provide a CBLAS interface? > > Matthieu D'oh! Very good point, I wasn't aware that numpy needed a CLBAS interface. I now followed the steps outlined at the end of http://mail.scipy.org/pipermail/numpy-discussion/2006-February/018379.html (Basically: I built a libcblas.a for ACML, then tried to build numpy with that). However it didn't seem to work. The same 5000x5000 matrix-multiply is still spinning at 100% CPU usage. I attached to the process after I let it run for over 3 minutes, and the stacktrace looked like this: #0 DOUBLE_dot (ip1=, is1=8, ip2= [...gibberish...], is2=40000, op=0x7f8633086000 "", n=5000, __NPY_UNUSED_TAGGEDignore=0x23f40f0) at numpy/core/src/multiarray/arraytypes.c.src:3077 #1 0x00007f864dea1466 in PyArray_MatrixProduct2 (op1=, op2=, out=) at numpy/core/src/multiarray/multiarraymodule.c:847 #2 0x00007f864dea18ed in array_matrixproduct (__NPY_UNUSED_TAGGEDdummy=, args=, kwds=) at numpy/core/src/multiarray/multiarraymodule.c:2025 #3 0x000000000042a485 in PyEval_EvalFrameEx () #4 0x00000000004317f2 in PyEval_EvalCodeEx () #5 0x000000000054bd50 in PyRun_InteractiveOneFlags () #6 0x000000000054c045 in PyRun_InteractiveLoopFlags () #7 0x000000000054ce9f in Py_Main () #8 0x00007f864ec7976d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #9 0x000000000041b931 in _start () So either the process needed changed since that message was written in 2006, or I did something wrong (I am absolutely unfamiliar with the build-system used by numpy) or missed something :( Thomas > > 2012/5/12 Thomas Unterthiner > > > > On 05/12/2012 03:27 PM, numpy-discussion-request at scipy.org > wrote: > > 12.05.2012 00:54, Thomas Unterthiner kirjoitti: > > [clip] > >> > The process will have 100% CPU usage and will not show any > activity > >> > under strace. A gdb backtrace looks as follows: > >> > > >> > (gdb) bt > >> > #0 0x00007fdcc000e524 in ?? () > >> > from > /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so > > [clip] > > > > The backtrace looks like it does not use ACML. Does > > > > from numpy.core._dotblas import dot > > > > work? > > > > Thanks for having a look at this. The following was tried with the > numpy that comes from the Ubuntu repo and symlinked ACML: > > > $ python > Python 2.7.3 (default, Apr 20 2012, 22:39:59) > [GCC 4.6.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from numpy.core._dotblas import dot > Traceback (most recent call last): > File "", line 1, in > ImportError: /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so: > undefined symbol: cblas_cdotc_sub > >>> > > > Following up: > > $ ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so > linux-vdso.so.1 => (0x00007fff3de00000) > libblas.so.3gf => /usr/lib/libblas.so.3gf (0x00007f10965f8000) > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1096238000) > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 > (0x00007f1096030000) > libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 > (0x00007f1095d18000) > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1095a18000) > /lib64/ld-linux-x86-64.so.2 (0x00007f1098a88000) > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 > (0x00007f10957f8000) > libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 > (0x00007f10955c0000) > $ ls -lh /usr/lib/libblas.so.3gf > lrwxrwxrwx 1 root root 32 May 11 22:27 /usr/lib/libblas.so.3gf -> > /etc/alternatives/libblas.so.3gf > $ ls -lh /etc/alternatives/libblas.so.3gf > lrwxrwxrwx 1 root root 45 May 11 22:36 > /etc/alternatives/libblas.so.3gf > -> /opt/acml5.1.0/gfortran64_fma4/lib/libacml.so > > > > Cheers > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat May 12 11:34:18 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 12 May 2012 17:34:18 +0200 Subject: [Numpy-discussion] Problems when using ACML with numpy In-Reply-To: <4FAE821F.6000509@web.de> References: <4FAE6C8E.7010501@web.de> <4FAE821F.6000509@web.de> Message-ID: 12.05.2012 17:30, Thomas Unterthiner kirjoitti: [clip] > However it didn't seem to work. The same 5000x5000 matrix-multiply is > still spinning at 100% CPU usage. I attached to the process after I let > it run for over 3 minutes, and the stacktrace looked like this: > > #0 DOUBLE_dot (ip1=, is1=8, ip2= > [...gibberish...], is2=40000, op=0x7f8633086000 "", > n=5000, __NPY_UNUSED_TAGGEDignore=0x23f40f0) at > numpy/core/src/multiarray/arraytypes.c.src:3077 > #1 0x00007f864dea1466 in PyArray_MatrixProduct2 (op1=, > op2=, out=) > at numpy/core/src/multiarray/multiarraymodule.c:847 This is also not using ACML. You'll need to adjust things until the line from numpy.core._dotblas import dot works. CBLAS is indeed needed. -- Pauli Virtanen From ralf.gommers at googlemail.com Sat May 12 12:12:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 12 May 2012 18:12:36 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: <53503912-3DD7-4DF1-B679-8CCA385674EE@gmail.com> Message-ID: On Tue, May 8, 2012 at 3:23 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > On 06.05.2012, at 8:16AM, Paul Anton Letnes wrote: > > > All tests for 1.6.2rc1 pass on > > Mac OS X 10.7.3 > > python 2.7.2 > > gcc 4.2 (Apple) > > Passing as well on 10.6 x86_64 and on 10.5.8 ppc with > python 2.5.6/2.6.6/2.7.2 Apple gcc 4.0.1, > but I am getting one failure on Lion (same with Python 2.5.6+2.6.7): > > Python version 2.7.3 (default, May 6 2012, 15:05:35) [GCC 4.2.1 > Compatible Apple Clang 3.0 (tags/Apple/clang-211.12)] > nose version 1.1.2 > ====================================================================== > FAIL: Test basic arithmetic function errors > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/sw/lib/python2.7/site-packages/numpy/testing/decorators.py", line > 215, in knownfailer > return f(*args, **kwargs) > File "/sw/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py", > line 323, in test_floating_exceptions > lambda a,b:a*b, ft_tiny, ft_tiny) > File "/sw/lib/python2.7/site-packages/numpy/core/tests/test_numeric.py", > line 271, in assert_raises_fpe > "Type %s did not raise fpe error '%s'." % (ftype, fpeerr)) > File "/sw/lib/python2.7/site-packages/numpy/testing/utils.py", line 34, > in assert_ > raise AssertionError(msg) > AssertionError: Type did not raise fpe error ''. > "test_floating_exceptions" and "test_floating_exceptions_power" keep on failing on a number of different platform/compiler combinations. It's http://projects.scipy.org/numpy/ticket/1755. It's quite hard to find the issue. I propose to just mark these as knownfail unconditionally in both 1.6.x and master. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat May 12 12:13:24 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 12 May 2012 18:13:24 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 5, 2012 at 11:06 PM, Charles R Harris wrote: > > > On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes < > paul.anton.letnes at gmail.com> wrote: > >> Hi, >> >> I'm getting a couple of errors when testing. System: >> Arch Linux (updated today) >> Python 3.2.3 >> gcc 4.7.0 >> (Anything else?) >> >> I think that this error: >> AssertionError: selectedrealkind(19): expected -1 but got 16 >> is due to the fact that newer versions of gfortran actually supports >> precision this high (quad precision). >> >> Cheers >> Paul >> >> >> python -c 'import numpy;numpy.test("full")' >> Running unit tests for numpy >> NumPy version 1.6.1 >> NumPy is installed in /usr/lib/python3.2/site-packages/numpy >> Python version 3.2.3 (default, Apr 23 2012, 23:35:30) [GCC 4.7.0 >> 20120414 (prerelease)] >> nose version 1.1.2 >> >> ....S.................................................................................................................................................................................................................................................................................S......................................................................................................................................................................................................................................................................................................................................................................................................................SSS...........................................................................................K............................................................................K.............................................................................................................K.................................................................................................K......................K..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../usr/lib/python3.2/site-packages/numpy/lib/format.py:575: >> ResourceWarning: unclosed file <_io.BufferedReader >> name='/tmp/tmpmkxhkq'> >> mode=mode, offset=offset) >> >> ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................/usr/lib/python3.2/subprocess.py:471: >> ResourceWarning: unclosed file <_io.FileIO name=3 mode='rb'> >> return Popen(*popenargs, **kwargs).wait() >> /usr/lib/python3.2/subprocess.py:471: ResourceWarning: unclosed file >> <_io.FileIO name=7 mode='rb'> >> return Popen(*popenargs, **kwargs).wait() >> >> ..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F................................................................................................................... >> ====================================================================== >> FAIL: test_kind.TestKind.test_all >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "/usr/lib/python3.2/site-packages/nose/case.py", line 198, in >> runTest >> self.test(*self.arg) >> File "/usr/lib/python3.2/site-packages/numpy/f2py/tests/test_kind.py", >> line 30, in test_all >> 'selectedrealkind(%s): expected %r but got %r' % (i, >> selected_real_kind(i), selectedrealkind(i))) >> File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line >> 34, in assert_ >> raise AssertionError(msg) >> AssertionError: selectedrealkind(19): expected -1 but got 16 >> > > This should have been fixed. Hmm... > > >> >> ====================================================================== >> FAIL: test_pareto (test_random.TestRandomDist) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/usr/lib/python3.2/site-packages/numpy/random/tests/test_random.py", >> line 313, in test_pareto >> np.testing.assert_array_almost_equal(actual, desired, decimal=15) >> File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line >> 800, in assert_array_almost_equal >> header=('Arrays are not almost equal to %d decimals' % decimal)) >> File "/usr/lib/python3.2/site-packages/numpy/testing/utils.py", line >> 636, in assert_array_compare >> raise AssertionError(msg) >> AssertionError: >> Arrays are not almost equal to 15 decimals >> >> (mismatch 16.66666666666667%) >> x: array([[ 2.46852460e+03, 1.41286881e+03], >> [ 5.28287797e+07, 6.57720981e+07], >> [ 1.40840323e+02, 1.98390255e+05]]) >> y: array([[ 2.46852460e+03, 1.41286881e+03], >> [ 5.28287797e+07, 6.57720981e+07], >> [ 1.40840323e+02, 1.98390255e+05]]) >> >> > I can't think of anything that would affect this apart from the compiler > version. Perhaps the precision needs to be backed off a bit. > Paul, could you check the needed precision to make this test pass? Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matrixhasu at gmail.com Sat May 12 12:22:38 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Sat, 12 May 2012 18:22:38 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: Hello, On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers wrote: > Hi, > > I'm pleased to announce the availability of the first release candidate of > NumPy 1.6.2.? This is a maintenance release. Due to the delay of the NumPy > 1.7.0, this release contains far more fixes than a regular NumPy bugfix > release.? It also includes a number of documentation and build improvements. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > Please test this release and report any issues on the numpy-discussion > mailing list. ... > BLD:?? add support for the new X11 directory structure on Ubuntu & co. We've just discovered that this fix is not enough. Actually the new directories are due to the "multi-arch" feature of Debian systems, that allows to install libraries from other (foreign) architectures than the one the machine is (the classic example, i386 libraries on a amd64 host). the fix included to look up in additional directories is currently only for X11, while for example Debian has fftw3 that's multi-arch-ified and thus will fail to be detected. Could this fix be extended to include all other things that are checked? for reference the bug in Debian is [1]; there was also a patch[2] in previous versions, that was using gcc to get the multi-arch paths - you might use as a reference, or to implement something debian-systems-specific. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=640940 [2] http://anonscm.debian.org/viewvc/python-modules/packages/numpy/trunk/debian/patches/50_search-multiarch-paths.patch?view=markup&pathrev=21168 It would be awesome is such support would end up in 1.6.2 . Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From thomas_unterthiner at web.de Sat May 12 13:35:01 2012 From: thomas_unterthiner at web.de (Thomas Unterthiner) Date: Sat, 12 May 2012 19:35:01 +0200 Subject: [Numpy-discussion] Problems when using ACML with numpy In-Reply-To: References: <4FAE6C8E.7010501@web.de> <4FAE821F.6000509@web.de> Message-ID: <4FAE9F45.7040807@web.de> On 05/12/2012 05:34 PM, Pauli Virtanen wrote: > 12.05.2012 17:30, Thomas Unterthiner kirjoitti: > [clip] >> However it didn't seem to work. The same 5000x5000 matrix-multiply is >> still spinning at 100% CPU usage. I attached to the process after I let >> it run for over 3 minutes, and the stacktrace looked like this: >> >> #0 DOUBLE_dot (ip1=, is1=8, ip2= >> [...gibberish...], is2=40000, op=0x7f8633086000 "", >> n=5000, __NPY_UNUSED_TAGGEDignore=0x23f40f0) at >> numpy/core/src/multiarray/arraytypes.c.src:3077 >> #1 0x00007f864dea1466 in PyArray_MatrixProduct2 (op1=, >> op2=, out=) >> at numpy/core/src/multiarray/multiarraymodule.c:847 > This is also not using ACML. You'll need to adjust things until the line > > from numpy.core._dotblas import dot > > works. CBLAS is indeed needed. > It seems like I just can't get it to work. Here is what I did: I downloaded cblas from netlib, and changed the Makefile.LINUX to read: BLLIB = /opt/acml5.1.0/gfortran64_fma4/lib/libacml.so CBLIB = ../lib/cblas.a CFLAGS = -O3 -DADD_ -march=native -flto -fPIC FFLAGS = -O3 -march=native -flto RANLIB = ranlib Then renamed Makefile.LINUX to Makefile.in, and built cblas. I copied the resulting libcblas.a to /opt/acml5.1.0/gfortran64_fma4/lib/ . I also copied the files in $CBLAS_DIR/include to /opt/acml5.1.0/gfortran64_fma4/include (not sure if that was needed). I then modified my .bashrc by adding the line export LD_LIBRARY_PATH=/opt/acml5.1.0/gfortran64_fma4/lib I started a new shell afterwards to make sure the variable was set. Next, I downloaded the numpy-1.6.1 source distribution. I then created the following 'site.cfg': [blas] blas_libs = cblas,acml library_dirs = /opt/acml5.1.0/gfortran64_fma4/lib include_dirs = /opt/acml5.1.0/gfortran64_fma4/include [lapack] language = f77 lapack_libs = acml library_dirs = /opt/acml5.1.0/gfortran64_fma4/lib include_dirs = /opt/acml5.1.0/gfortran64_fma4/include I'm not sure where 'site.cfg' is supposed to go, so I placed one copy in the root-directory of the package, one in ./numpy/ and one in ./numpy/distutils/ I then did 'sudo python setup.py install --prefix=/usr/local' which built and installed without a hitch. (Sidenote: in other tries, I noticed that if I export CFLAGS and LDFLAGS before calling this, there will be an error, is that normal? How am I supposed to set CFLAGS when building?) Anyways, when I don't export any environment variables and just do 'sudo python setup.py install --prefix=/usr/local', evertying works smoothly. Amongst other things, I get the following output while compiling: $ sudo python setup.py install --prefix=/usr/local 2>&1 [...] /media/scratch/software/numpy/numpy-1.6.1/numpy/distutils/system_info.py:1414: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) blas_info: FOUND: libraries = ['cblas', 'acml'] library_dirs = ['/opt/acml5.1.0/gfortran64_fma4/lib'] language = f77 FOUND: libraries = ['cblas', 'acml'] library_dirs = ['/opt/acml5.1.0/gfortran64_fma4/lib'] define_macros = [('NO_ATLAS_INFO', 1)] language = f77 [...] lapack_info: FOUND: libraries = ['acml'] library_dirs = ['/opt/acml5.1.0/gfortran64_fma4/lib'] language = f77 FOUND: libraries = ['acml', 'cblas', 'acml'] library_dirs = ['/opt/acml5.1.0/gfortran64_fma4/lib'] define_macros = [('NO_ATLAS_INFO', 1)] language = f77 [...] 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h'] building extension "numpy.core._dotblas" sources building extension "numpy.core.umath_tests" sources conv_template:> build/src.linux-x86_64-2.7/numpy/core/src/umath/umath_tests.c [...] /usr/bin/gfortran -Wall -Wall -shared build/temp.linux-x86_64-2.7/numpy/linalg/lapack_litemodule.o build/temp.linux-x86_64-2.7/numpy/linalg/python_xerbla.o -L/opt/acml5.1.0/gfortran64_fma4/lib -Lbuild/temp.linux-x86_64-2.7 -lacml -lcblas -lacml -lgfortran -o build/lib.linux-x86_64-2.7/numpy/linalg/lapack_lite.so The output contains no other mention of anything BLAS-related. Notice the absense of any calls to gcc/gfortran after the 'building extension "numpy.core._dotblas" sources' message. After the build is done, the directory ./build/lib.linux-x86_64-2.7/numpy/core/ does not contain any "dotblas" files, and neither does /usr/local/lib/python2.7/dist-packages/numpy/core/ afterwards. Thus it is no surprise that the following still fails: tom at blucomp:~$ python Python 2.7.3 (default, Apr 20 2012, 22:39:59) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from numpy.core._dotblas import dot Traceback (most recent call last): File "", line 1, in ImportError: No module named _dotblas From ralf.gommers at googlemail.com Sat May 12 15:17:29 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 12 May 2012 21:17:29 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 12, 2012 at 6:22 PM, Sandro Tosi wrote: > Hello, > > On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers > wrote: > > Hi, > > > > I'm pleased to announce the availability of the first release candidate > of > > NumPy 1.6.2. This is a maintenance release. Due to the delay of the > NumPy > > 1.7.0, this release contains far more fixes than a regular NumPy bugfix > > release. It also includes a number of documentation and build > improvements. > > > > Sources and binary installers can be found at > > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > > > Please test this release and report any issues on the numpy-discussion > > mailing list. > ... > > BLD: add support for the new X11 directory structure on Ubuntu & co. > > We've just discovered that this fix is not enough. Actually the new > directories are due to the "multi-arch" feature of Debian systems, > that allows to install libraries from other (foreign) architectures > than the one the machine is (the classic example, i386 libraries on a > amd64 host). > > the fix included to look up in additional directories is currently > only for X11, while for example Debian has fftw3 that's > multi-arch-ified and thus will fail to be detected. > > Could this fix be extended to include all other things that are > checked? for reference the bug in Debian is [1]; there was also a > patch[2] in previous versions, that was using gcc to get the > multi-arch paths - you might use as a reference, or to implement > something debian-systems-specific. > > [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=640940 > [2] > http://anonscm.debian.org/viewvc/python-modules/packages/numpy/trunk/debian/patches/50_search-multiarch-paths.patch?view=markup&pathrev=21168 > > It would be awesome is such support would end up in 1.6.2 . > Hardcoding some more paths to check in distutils/system_info.py should be OK, also for 1.6.2 (will require a new RC). The --print-multiarch thing looks very questionable. As far as I can tell, it's a Debian specific gcc patch, only available in gcc 4.6 and up. Ubuntu before 11.10 release also doesn't have it. Therefore I don't think use of --print-multiarch is appropriate for numpy for now, and certainly not a change I'd like to make to distutils right before a release. If anyone with access to a Debian/Ubuntu system could come up with a patch which adds the right paths to system_info.py, that would be great. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat May 12 15:50:59 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 12 May 2012 21:50:59 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sun, May 6, 2012 at 12:12 AM, Charles R Harris wrote: > > > On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes < > paul.anton.letnes at gmail.com> wrote: > >> Hi, >> >> I'm getting a couple of errors when testing. System: >> Arch Linux (updated today) >> Python 3.2.3 >> gcc 4.7.0 >> (Anything else?) >> >> I think that this error: >> AssertionError: selectedrealkind(19): expected -1 but got 16 >> is due to the fact that newer versions of gfortran actually supports >> precision this high (quad precision). >> >> > Yes, but it should be fixed. I can't duplicate this here with a fresh > checkout of the branch. > This failure makes no sense to me. Error comes from this code: 'selectedrealkind(%s): expected %r but got %r' % (i, selected_real_kind(i), selectedrealkind(i))) So "selected_real_kind(19)" returns -1. selected_real_kind is the function numpy.f2py.crackfortran._selected_real_kind_func, which is defined as: def _selected_real_kind_func(p, r=0, radix=0): #XXX: This should be processor dependent # This is only good for 0 <= p <= 20 if p < 7: return 4 if p < 16: return 8 if platform.machine().lower().startswith('power'): if p <= 20: return 16 else: if p < 19: return 10 elif p <= 20: return 16 return -1 For p=19 this function should always return 16. So the result from compiling foo.f90 is fine, but the test is broken in a very strange way. Paul, is the failure reproducible on your machine? If so, can you try to debug it? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sat May 12 16:38:18 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 12 May 2012 22:38:18 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> Message-ID: <4FAECA3A.8090604@astro.uio.no> On 05/11/2012 10:10 PM, Mark Wiebe wrote: > On Fri, May 11, 2012 at 6:13 AM, Dag Sverre Seljebotn > > wrote: > > (NumPy devs: I know, I get too many ideas. But this time I *really* > believe in it, I think this is going to be *huge*. And if Mark F. likes > it it's not going to be without manpower; and as his mentor I'd pitch in > too here and there.) > > (Mark F.: I believe this is *very* relevant to your GSoC. I certainly > don't want to micro-manage your GSoC, just have your take.) > > Travis, thank you very much for those good words in the "NA-mask > interactions..." thread. It put most of my concerns away. If anybody is > leaning towards for opaqueness because of its OOP purity, I want to > refer to C++ and its walled-garden of ideological purity -- it has, > what, 3-4 different OOP array libraries, neither of which is able to > out-compete the other. Meanwhile the rest of the world happily > cooperates using pointers, strides, CSR and CSC. > > Now, there are limits to what you can do with strides and pointers. > Noone's denying the need for more. In my mind that's an API where you > can do fetch_block and put_block of cache-sized, N-dimensional blocks on > an array; but it might be something slightly different. > > Here's what I'm asking: DO NOT simply keep extending ndarray and the > NumPy C API to deal with this issue. > > What we need is duck-typing/polymorphism at the C level. If you keep > extending ndarray and the NumPy C API, what we'll have is a one-to-many > relationship: One provider of array technology, multiple consumers (with > hooks, I'm sure, but all implementations of the hook concept in the > NumPy world I've seen so far are a total disaster!). > > > There is similar intent behind an idea I raised last summer here: > > http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html > > I stopped doing anything on it when considering the scope of it with the > linear algebra functions included like Pauli suggested. Nathaniel did an > experimental implementation of some parts of the idea in Python here: Hmm. Ufuncs. I'll think more about that, and read the thread again, to see if I can discover the connection to my proposal above. (Re your thread, I always wished for something based on multiple dispatch rather than a priority battle to resolve NumPy operators; you really only have a partial ordering between libraries wanting to interact with ndarrays, not a total ordering) Re linear algebra, I have, since I use it in my PhD, been working on and off the past 4 months on a library for polymorphic linear algebra. There's something analogue to the array and ufunc distinction of NumPy, but the "ufunc" part is designed very differently (and based on multiple dispatch). Anyway, what I was talking about was basically a new C interface to arrays that should be more tolerant for implementations that need to be opaque (like compressed arrays, database connections, my own custom sparse format...). If that somehow relates with ufuncs, fine with me :-) > > https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py#L107 > > What I think we need instead is something like PEP 3118 for the > "abstract" array that is only available block-wise with getters and > setters. On the Cython list we've decided that what we want for CEP 1000 > (for boxing callbacks etc.) is to extend PyTypeObject with our own > fields; we could create CEP 1001 to solve this issue and make any Python > object an exporter of "block-getter/setter-arrays" (better name needed). > > What would be exported is (of course) a simple vtable: > > typedef struct { > int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t > *lower_right, ...); > ... > } block_getter_setter_array_vtable; > > Let's please discuss the details *after* the fundamentals. But the > reason I put void* there instead of PyObject* is that I hope this could > be used beyond the Python world (say, Python<->Julia); the void* would > be handed to you at the time you receive the vtable (however we handle > that). > > I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you > could embed the block-transposition that's needed for efficient "arr + > arr.T" at this level. > > Imagine being able to do this in Cython: > > a[...] = b + c * d > > and have that essentially compile to the numexpr blocked approach, *but* > where b, c, and d can have whatever type that exports CEP 1001? So c > could be a "diagonal" array which uses O(n) storage to export O(n^2) > elements, for instance, and the unrolled Cython code never needs to > know. > > As far as NumPy goes, something along these lines should hopefully mean > that new C code being written doesn't rely so much on what exactly goes > into "ndarray" and what goes into other classes; so that we don't get > the same problem again that we do now with code that doesn't use PEP > 3118. > > > This general idea is very good. I think PEP 3118 captures a lot of the > essence of the ndarray, but there's a lot of potential generality that > it doesn't handle, such as the "diagonal" array or pluggable dtypes. I think the long-term generality is a lot bigger than that: - Compressed arrays - Interfaces to HDF files - Distributed-memory arrays - Blocked arrays - Semi-sparse and sparse (diagonal, but also triangular, symmetric, repeating, ...) - Lazy evaluation: "generating_multiply(mydata, zero_mask)" While what me and Mark F. cares about is computational efficiency for current arrays, this generality is almost unavoidable. In fact -- from ideas Travis have posted to this list earlier + continuum.io, I assume this wider scope is something you and Travis must necessarily have thought a lot about. Anyway, I agree with Mark F. that right design is probably a new, low-level, (very small!) C library with no Python dependencies that just provides some APIs to try to standardize this "how to communicate array data" at a more basic level than NumPy (and much smaller and different scope than the various "distill NumPy to a C core" things that's been talked about the past years, something I have zero interest in). If NumPy devs are interested in this discussion on a detailed level, please say so; me and Mark F might go to Skype (or even meet in person) to get higher bandwidth than ML, and if more people should be invited then it's good to know. Dag From d.s.seljebotn at astro.uio.no Sat May 12 17:35:30 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 12 May 2012 23:35:30 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> Message-ID: <4FAED7A2.8050205@astro.uio.no> On 05/11/2012 03:25 PM, mark florisson wrote: > On 11 May 2012 12:13, Dag Sverre Seljebotn wrote: >> (NumPy devs: I know, I get too many ideas. But this time I *really* believe >> in it, I think this is going to be *huge*. And if Mark F. likes it it's not >> going to be without manpower; and as his mentor I'd pitch in too here and >> there.) >> >> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't >> want to micro-manage your GSoC, just have your take.) >> >> Travis, thank you very much for those good words in the "NA-mask >> interactions..." thread. It put most of my concerns away. If anybody is >> leaning towards for opaqueness because of its OOP purity, I want to refer to >> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >> different OOP array libraries, neither of which is able to out-compete the >> other. Meanwhile the rest of the world happily cooperates using pointers, >> strides, CSR and CSC. BTW, this seems like pure hyperbole out of context...the backstory is that I've talked to people who do sparse linear algebra in C++ using Boost, iterators, etc. rather than something that could conceivably be exported with a binary interface; primarily out of a wish to be more elegant than 'legacy' programming languages that have to resort to things like CSR. Sure, that's elegant modern C++, but at what cost? >> >> Now, there are limits to what you can do with strides and pointers. Noone's >> denying the need for more. In my mind that's an API where you can do >> fetch_block and put_block of cache-sized, N-dimensional blocks on an array; >> but it might be something slightly different. >> >> Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C >> API to deal with this issue. >> >> What we need is duck-typing/polymorphism at the C level. If you keep >> extending ndarray and the NumPy C API, what we'll have is a one-to-many >> relationship: One provider of array technology, multiple consumers (with >> hooks, I'm sure, but all implementations of the hook concept in the NumPy >> world I've seen so far are a total disaster!). >> >> What I think we need instead is something like PEP 3118 for the "abstract" >> array that is only available block-wise with getters and setters. On the >> Cython list we've decided that what we want for CEP 1000 (for boxing >> callbacks etc.) is to extend PyTypeObject with our own fields; we could >> create CEP 1001 to solve this issue and make any Python object an exporter >> of "block-getter/setter-arrays" (better name needed). >> >> What would be exported is (of course) a simple vtable: >> >> typedef struct { >> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, >> ...); >> ... >> } block_getter_setter_array_vtable; > > Interesting idea, I think I like it. So I suppose these blocks could > even be used in a sparse context, where it returns one-sized blocks > for un-surrounded elements. This means returned blocks can always be > variably sized, correct? So will each block be associated with an > index space? Otherwise, how do we know the element-wise > correspondence? Will there be a default element for missing > (unreturned) values? > > Am I correct in assuming elements in returned blocks always return > elements in element-wise order? In case you manually have to block > other parts of an array (lets say it's stored in Fortran order), then > the variable sized block nature may be complicating the tiling > pattern. - My hunch was that you'd always get something C-contiguous and the exporter would copy/transpose. But perhaps one wouldn't loose much by making the block strided and have the consumer do a copy; isn't that mostly about which side of the boundary to call the copying utility? - I think iterators over blocks are necesarry as well - Not sure about variable-sized blocks. The array could already be stored in cache-sized blocks, and the numerical loops could be optimized for some block sizes, so there must certainly be some handshaking mechanism; hopefully it can be made rather elegant. See next point for one ingredient. - What must be established is whether a) one should always copy, b) one can copy if one wants to at a negligible cost, or c) computational core reaching into the buffer of an exporter is better. This would trickle down into lots of other decisions I feel (like if you always go for a rectangular block, or whether getting entire rows like with the NumPy iteration API is sometimes more valuable). In addition to the Fortran compilers, one thing to study is GotoBLAS2/OpenBLAS, which has code for getting a "high-quality memory region", since that apparently made an impact. Not sure how much Intel platforms though; other platforms have slower TLBs, in fact I think on non-Intel platforms, TLB is what affected block size, not the cache size. > >> Let's please discuss the details *after* the fundamentals. But the reason I >> put void* there instead of PyObject* is that I hope this could be used >> beyond the Python world (say, Python<->Julia); the void* would be handed to >> you at the time you receive the vtable (however we handle that). > > Yes, we should definitely not stick to objects here. In fact, if we can get this done right, I'd have uses for such an API in my pure C/C++ libraries that I want to be usable from Fortran without a Python dependency... > >> I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you >> could embed the block-transposition that's needed for efficient "arr + >> arr.T" at this level. >> >> Imagine being able to do this in Cython: >> >> a[...] = b + c * d >> >> and have that essentially compile to the numexpr blocked approach, *but* >> where b, c, and d can have whatever type that exports CEP 1001? So c could >> be a "diagonal" array which uses O(n) storage to export O(n^2) elements, for >> instance, and the unrolled Cython code never needs to know. > > I assume random accesses will happen through some kind of > B-tree/sparse index like mechanism? Here I don't follow. Accesses happens polymorphically, in whatever way the array in question wants to. I already wrote some examples in the email to Mark W.; on the computational side, there's certainly a lot of different sparse formats, regular C/Fortan, blocked formats (more efficient linear algebra), volume-filling fractals (for spatial locality in N dimensions, probably not that helpful in single-node contexts though)... Dag From d.s.seljebotn at astro.uio.no Sat May 12 17:43:29 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 12 May 2012 23:43:29 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FAED7A2.8050205@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> <4FAED7A2.8050205@astro.uio.no> Message-ID: <4FAED981.7020502@astro.uio.no> On 05/12/2012 11:35 PM, Dag Sverre Seljebotn wrote: > On 05/11/2012 03:25 PM, mark florisson wrote: >> On 11 May 2012 12:13, Dag Sverre Seljebotn wrote: >>> (NumPy devs: I know, I get too many ideas. But this time I *really* believe >>> in it, I think this is going to be *huge*. And if Mark F. likes it it's not >>> going to be without manpower; and as his mentor I'd pitch in too here and >>> there.) >>> >>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't >>> want to micro-manage your GSoC, just have your take.) >>> >>> Travis, thank you very much for those good words in the "NA-mask >>> interactions..." thread. It put most of my concerns away. If anybody is >>> leaning towards for opaqueness because of its OOP purity, I want to refer to >>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >>> different OOP array libraries, neither of which is able to out-compete the >>> other. Meanwhile the rest of the world happily cooperates using pointers, >>> strides, CSR and CSC. > > BTW, this seems like pure hyperbole out of context...the backstory is > that I've talked to people who do sparse linear algebra in C++ using > Boost, iterators, etc. rather than something that could conceivably be > exported with a binary interface; primarily out of a wish to be more > elegant than 'legacy' programming languages that have to resort to > things like CSR. Sure, that's elegant modern C++, but at what cost? > >>> >>> Now, there are limits to what you can do with strides and pointers. Noone's >>> denying the need for more. In my mind that's an API where you can do >>> fetch_block and put_block of cache-sized, N-dimensional blocks on an array; >>> but it might be something slightly different. >>> >>> Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C >>> API to deal with this issue. >>> >>> What we need is duck-typing/polymorphism at the C level. If you keep >>> extending ndarray and the NumPy C API, what we'll have is a one-to-many >>> relationship: One provider of array technology, multiple consumers (with >>> hooks, I'm sure, but all implementations of the hook concept in the NumPy >>> world I've seen so far are a total disaster!). >>> >>> What I think we need instead is something like PEP 3118 for the "abstract" >>> array that is only available block-wise with getters and setters. On the >>> Cython list we've decided that what we want for CEP 1000 (for boxing >>> callbacks etc.) is to extend PyTypeObject with our own fields; we could >>> create CEP 1001 to solve this issue and make any Python object an exporter >>> of "block-getter/setter-arrays" (better name needed). >>> >>> What would be exported is (of course) a simple vtable: >>> >>> typedef struct { >>> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, >>> ...); >>> ... >>> } block_getter_setter_array_vtable; >> >> Interesting idea, I think I like it. So I suppose these blocks could >> even be used in a sparse context, where it returns one-sized blocks >> for un-surrounded elements. This means returned blocks can always be >> variably sized, correct? So will each block be associated with an >> index space? Otherwise, how do we know the element-wise >> correspondence? Will there be a default element for missing >> (unreturned) values? >> >> Am I correct in assuming elements in returned blocks always return >> elements in element-wise order? In case you manually have to block >> other parts of an array (lets say it's stored in Fortran order), then >> the variable sized block nature may be complicating the tiling >> pattern. > > - My hunch was that you'd always get something C-contiguous and the > exporter would copy/transpose. But perhaps one wouldn't loose much by > making the block strided and have the consumer do a copy; isn't that > mostly about which side of the boundary to call the copying utility? > > - I think iterators over blocks are necesarry as well > > - Not sure about variable-sized blocks. The array could already be > stored in cache-sized blocks, and the numerical loops could be optimized > for some block sizes, so there must certainly be some handshaking > mechanism; hopefully it can be made rather elegant. See next point for > one ingredient. > > - What must be established is whether a) one should always copy, b) > one can copy if one wants to at a negligible cost, or c) computational > core reaching into the buffer of an exporter is better. This would > trickle down into lots of other decisions I feel (like if you always go > for a rectangular block, or whether getting entire rows like with the > NumPy iteration API is sometimes more valuable). > > In addition to the Fortran compilers, one thing to study is > GotoBLAS2/OpenBLAS, which has code for getting a "high-quality memory > region", since that apparently made an impact. Not sure how much Intel > platforms though; other platforms have slower TLBs, in fact I think on > non-Intel platforms, TLB is what affected block size, not the cache size. > > >> >>> Let's please discuss the details *after* the fundamentals. But the reason I >>> put void* there instead of PyObject* is that I hope this could be used >>> beyond the Python world (say, Python<->Julia); the void* would be handed to >>> you at the time you receive the vtable (however we handle that). >> >> Yes, we should definitely not stick to objects here. > > In fact, if we can get this done right, I'd have uses for such an API in > my pure C/C++ libraries that I want to be usable from Fortran without a > Python dependency... > >> >>> I think this would fit neatly in Mark F.'s GSoC (Mark F.?), because you >>> could embed the block-transposition that's needed for efficient "arr + >>> arr.T" at this level. >>> >>> Imagine being able to do this in Cython: >>> >>> a[...] = b + c * d >>> >>> and have that essentially compile to the numexpr blocked approach, *but* >>> where b, c, and d can have whatever type that exports CEP 1001? So c could >>> be a "diagonal" array which uses O(n) storage to export O(n^2) elements, for >>> instance, and the unrolled Cython code never needs to know. >> >> I assume random accesses will happen through some kind of >> B-tree/sparse index like mechanism? > > Here I don't follow. Accesses happens polymorphically, in whatever way > the array in question wants to. I already wrote some examples in the > email to Mark W.; on the computational side, there's certainly a lot of > different sparse formats, regular C/Fortan, blocked formats (more > efficient linear algebra), volume-filling fractals (for spatial locality > in N dimensions, probably not that helpful in single-node contexts > though)... An example that's actually non-contrived and interesting is H-matrices and H^2-matrices ("hierarchical matrices"). Basically it's a way of compressing matrices so that you can get approximate inverses, matrix-multiply etc. in O(log(n)^p n), with p <=2. Briefly, you first construct a quad-tree of your matrix; then you approximate the blocks that can be approximated using low-rank approximations (SVDs or similar). And for certain matrices that appears to do wonders. (This is mostly used for matrices that produce O(n^2) elements non-trivially from O(n) parameters). (http://hlib.org is a starting point, but it's just theorem upon theorem, mostly in a PDE setting). Dag From d.s.seljebotn at astro.uio.no Sat May 12 17:55:36 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 12 May 2012 23:55:36 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> Message-ID: <4FAEDC58.8050502@astro.uio.no> On 05/11/2012 03:37 PM, mark florisson wrote: > On 11 May 2012 12:13, Dag Sverre Seljebotn wrote: >> (NumPy devs: I know, I get too many ideas. But this time I *really* believe >> in it, I think this is going to be *huge*. And if Mark F. likes it it's not >> going to be without manpower; and as his mentor I'd pitch in too here and >> there.) >> >> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't >> want to micro-manage your GSoC, just have your take.) >> >> Travis, thank you very much for those good words in the "NA-mask >> interactions..." thread. It put most of my concerns away. If anybody is >> leaning towards for opaqueness because of its OOP purity, I want to refer to >> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >> different OOP array libraries, neither of which is able to out-compete the >> other. Meanwhile the rest of the world happily cooperates using pointers, >> strides, CSR and CSC. >> >> Now, there are limits to what you can do with strides and pointers. Noone's >> denying the need for more. In my mind that's an API where you can do >> fetch_block and put_block of cache-sized, N-dimensional blocks on an array; >> but it might be something slightly different. >> >> Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C >> API to deal with this issue. >> >> What we need is duck-typing/polymorphism at the C level. If you keep >> extending ndarray and the NumPy C API, what we'll have is a one-to-many >> relationship: One provider of array technology, multiple consumers (with >> hooks, I'm sure, but all implementations of the hook concept in the NumPy >> world I've seen so far are a total disaster!). >> >> What I think we need instead is something like PEP 3118 for the "abstract" >> array that is only available block-wise with getters and setters. On the >> Cython list we've decided that what we want for CEP 1000 (for boxing >> callbacks etc.) is to extend PyTypeObject with our own fields; we could >> create CEP 1001 to solve this issue and make any Python object an exporter >> of "block-getter/setter-arrays" (better name needed). >> >> What would be exported is (of course) a simple vtable: >> >> typedef struct { >> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, >> ...); >> ... >> } block_getter_setter_array_vtable; >> >> Let's please discuss the details *after* the fundamentals. But the reason I >> put void* there instead of PyObject* is that I hope this could be used >> beyond the Python world (say, Python<->Julia); the void* would be handed to >> you at the time you receive the vtable (however we handle that). > > I suppose it would also be useful to have some way of predicting the > output format polymorphically for the caller. E.g. dense * > block_diagonal results in block diagonal, but dense + block_diagonal > results in dense, etc. It might be useful for the caller to know > whether it needs to allocate a sparse, dense or block-structured > array. Or maybe the polymorphic function could even do the allocation. > This needs to happen recursively of course, to avoid intermediate > temporaries. The compiler could easily handle that, and so could numpy > when it gets lazy evaluation. Ah. But that depends too on the computation to be performed too; a) elementwise, b) axis-wise reductions, c) linear algebra... In my oomatrix code (please don't look at it, it's shameful) I do this using multiple dispatch. I'd rather ignore this for as long as we can, only implementing "a[:] = ..." -- I can't see how decisions here would trickle down to the API that's used in the kernel, it's more like a pre-phase, and better treated orthogonally. > I think if the heavy lifting of allocating output arrays and exporting > these arrays work in numpy, then support in Cython could use that (I > can already hear certain people object to more complicated array stuff > in Cython :). Even better here would be an external project that each > our projects could use (I still think the nditer sorting functionality > of arrays should be numpy-agnostic and externally available). I agree with the separate project idea. It's trivial for NumPy to incorporate that as one of its methods for exporting arrays, and I don't think it makes sense to either build it into Cython, or outright depend on NumPy. Here's what I'd like (working title: NumBridge?). - Mission: Be the "double* + shape + strides" in a world where that is no longer enough, by providing tight, focused APIs/ABIs that are usable across C/Fortran/Python. I basically want something I can quickly acquire from a NumPy array, then pass it into my C code without dragging along all the cruft that I don't need. - Written in pure C + specs, usable without Python - PEP 3118 "done right", basically semi-standardize the internal Cython memoryview ABI and get something that's passable on stack - Get block get/put API - Iterator APIs - Utility code for exporters and clients (iteration code, axis reordering, etc.) Is the scope of that insane, or is it at least worth a shot to see how bad it is? Beyond figuring out a small subset that can be done first, and whether performance considerations must be taken or not, there's two complicating factors: Pluggable dtypes, memory management. Perhaps you could come to Oslo for a couple of days to brainstorm... Dag From charlesr.harris at gmail.com Sat May 12 18:27:14 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 12 May 2012 16:27:14 -0600 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FAEDC58.8050502@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> Message-ID: On Sat, May 12, 2012 at 3:55 PM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > On 05/11/2012 03:37 PM, mark florisson wrote: > > On 11 May 2012 12:13, Dag Sverre Seljebotn > wrote: > >> (NumPy devs: I know, I get too many ideas. But this time I *really* > believe > >> in it, I think this is going to be *huge*. And if Mark F. likes it it's > not > >> going to be without manpower; and as his mentor I'd pitch in too here > and > >> there.) > >> > >> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly > don't > >> want to micro-manage your GSoC, just have your take.) > >> > >> Travis, thank you very much for those good words in the "NA-mask > >> interactions..." thread. It put most of my concerns away. If anybody is > >> leaning towards for opaqueness because of its OOP purity, I want to > refer to > >> C++ and its walled-garden of ideological purity -- it has, what, 3-4 > >> different OOP array libraries, neither of which is able to out-compete > the > >> other. Meanwhile the rest of the world happily cooperates using > pointers, > >> strides, CSR and CSC. > >> > >> Now, there are limits to what you can do with strides and pointers. > Noone's > >> denying the need for more. In my mind that's an API where you can do > >> fetch_block and put_block of cache-sized, N-dimensional blocks on an > array; > >> but it might be something slightly different. > >> > >> Here's what I'm asking: DO NOT simply keep extending ndarray and the > NumPy C > >> API to deal with this issue. > >> > >> What we need is duck-typing/polymorphism at the C level. If you keep > >> extending ndarray and the NumPy C API, what we'll have is a one-to-many > >> relationship: One provider of array technology, multiple consumers (with > >> hooks, I'm sure, but all implementations of the hook concept in the > NumPy > >> world I've seen so far are a total disaster!). > >> > >> What I think we need instead is something like PEP 3118 for the > "abstract" > >> array that is only available block-wise with getters and setters. On the > >> Cython list we've decided that what we want for CEP 1000 (for boxing > >> callbacks etc.) is to extend PyTypeObject with our own fields; we could > >> create CEP 1001 to solve this issue and make any Python object an > exporter > >> of "block-getter/setter-arrays" (better name needed). > >> > >> What would be exported is (of course) a simple vtable: > >> > >> typedef struct { > >> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t > *lower_right, > >> ...); > >> ... > >> } block_getter_setter_array_vtable; > >> > >> Let's please discuss the details *after* the fundamentals. But the > reason I > >> put void* there instead of PyObject* is that I hope this could be used > >> beyond the Python world (say, Python<->Julia); the void* would be > handed to > >> you at the time you receive the vtable (however we handle that). > > > > I suppose it would also be useful to have some way of predicting the > > output format polymorphically for the caller. E.g. dense * > > block_diagonal results in block diagonal, but dense + block_diagonal > > results in dense, etc. It might be useful for the caller to know > > whether it needs to allocate a sparse, dense or block-structured > > array. Or maybe the polymorphic function could even do the allocation. > > This needs to happen recursively of course, to avoid intermediate > > temporaries. The compiler could easily handle that, and so could numpy > > when it gets lazy evaluation. > > Ah. But that depends too on the computation to be performed too; a) > elementwise, b) axis-wise reductions, c) linear algebra... > > In my oomatrix code (please don't look at it, it's shameful) I do this > using multiple dispatch. > > I'd rather ignore this for as long as we can, only implementing "a[:] = > ..." -- I can't see how decisions here would trickle down to the API > that's used in the kernel, it's more like a pre-phase, and better > treated orthogonally. > > > I think if the heavy lifting of allocating output arrays and exporting > > these arrays work in numpy, then support in Cython could use that (I > > can already hear certain people object to more complicated array stuff > > in Cython :). Even better here would be an external project that each > > our projects could use (I still think the nditer sorting functionality > > of arrays should be numpy-agnostic and externally available). > > I agree with the separate project idea. It's trivial for NumPy to > incorporate that as one of its methods for exporting arrays, and I don't > think it makes sense to either build it into Cython, or outright depend > on NumPy. > > Here's what I'd like (working title: NumBridge?). > > - Mission: Be the "double* + shape + strides" in a world where that is > no longer enough, by providing tight, focused APIs/ABIs that are usable > across C/Fortran/Python. > > I basically want something I can quickly acquire from a NumPy array, > then pass it into my C code without dragging along all the cruft that I > don't need. > > - Written in pure C + specs, usable without Python > > - PEP 3118 "done right", basically semi-standardize the internal > Cython memoryview ABI and get something that's passable on stack > > - Get block get/put API > > - Iterator APIs > > - Utility code for exporters and clients (iteration code, axis > reordering, etc.) > > Is the scope of that insane, or is it at least worth a shot to see how > bad it is? Beyond figuring out a small subset that can be done first, > and whether performance considerations must be taken or not, there's two > complicating factors: Pluggable dtypes, memory management. Perhaps you > could come to Oslo for a couple of days to brainstorm... > > There have been musings on this list along those lines with the idea that numpy/ufuncs would be built on top of that base, so it isn't crazy ;) Perhaps it is time to take a more serious look at it. Especially if there is help to get it implemented and made available through tools such as Cython. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat May 12 20:30:23 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 12 May 2012 19:30:23 -0500 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FAECA3A.8090604@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> <4FAECA3A.8090604@astro.uio.no> Message-ID: <1C373464-83AA-4F73-8260-70ACD75006DB@continuum.io> > > I think the long-term generality is a lot bigger than that: > > - Compressed arrays > - Interfaces to HDF files > - Distributed-memory arrays > - Blocked arrays > - Semi-sparse and sparse (diagonal, but also triangular, symmetric, > repeating, ...) > - Lazy evaluation: "generating_multiply(mydata, zero_mask)" > > While what me and Mark F. cares about is computational efficiency for > current arrays, this generality is almost unavoidable. > > In fact -- from ideas Travis have posted to this list earlier + > continuum.io, I assume this wider scope is something you and Travis must > necessarily have thought a lot about. > > Anyway, I agree with Mark F. that right design is probably a new, > low-level, (very small!) C library with no Python dependencies that just > provides some APIs to try to standardize this "how to communicate array > data" at a more basic level than NumPy (and much smaller and different > scope than the various "distill NumPy to a C core" things that's been > talked about the past years, something I have zero interest in). > > If NumPy devs are interested in this discussion on a detailed level, > please say so; me and Mark F might go to Skype (or even meet in person) > to get higher bandwidth than ML, and if more people should be invited > then it's good to know. > I, for one, am very interested in this discussion. This is very much along the lines I have been thinking. To me it is much more important to solidify the concepts of the "interface" and what is essential about it than to create yet another library. I think to your general notion of a N-d block transfer API you would also need 1-d, 2-d and maybe 3-d specializations which take an additional "axis" argument to denote which sub-region is being described. But, this is probably enough. I am not sure what the specific relationship is between your thoughts and the email thread Mark referenced, but I do know that there is a deep connection to the *concept* of ufuncs which are currently the core abstraction for iterating over low-level calculations. You want the ability to create more powerful iteration constructs (like broadcasting and generalized ufuncs and windowed kernel funcs) while only having to define a single calculation of the kernel. A more generalized ufunc notion coupled with an improved low-level interface concept and you could have a system for doing anything that is independent of NumPy and NumPy would just be one of many array concepts that could co-exist and share development resources. Your thoughts are definitely the future. We are currently building such a thing. We would like it to be open source. We are currently preparing a proposal to DARPA as part of their XDATA proposal in order to help fund this. Email me offlist if you would like to be a part of this proposal. You don't have to be a U.S. citizen to participate in this. Thanks, -Travis > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sat May 12 22:01:11 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 12 May 2012 19:01:11 -0700 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <1C373464-83AA-4F73-8260-70ACD75006DB@continuum.io> References: <4FACF458.6070200@astro.uio.no> <4FAECA3A.8090604@astro.uio.no> <1C373464-83AA-4F73-8260-70ACD75006DB@continuum.io> Message-ID: Hi, On Sat, May 12, 2012 at 5:30 PM, Travis Oliphant wrote: > Your thoughts are definitely the future. ? We are currently building such a thing. ? We would like it to be open source. ? ?We are currently preparing a proposal to DARPA as part of their XDATA proposal in order to help fund this. ? ?Email me offlist if you would like to be a part of this proposal. ? ?You don't have to be a U.S. citizen to participate in this. What if this work that you are doing now is not open-source? What relationship will it have to numpy? It's difficult to have a productive discussion on the list if the main work is going on elsewhere, and we don't know what it is. See you, Matthew From travis at continuum.io Sat May 12 22:24:02 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 12 May 2012 21:24:02 -0500 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> <4FAECA3A.8090604@astro.uio.no> <1C373464-83AA-4F73-8260-70ACD75006DB@continuum.io> Message-ID: <6C602666-3ACC-450B-B2FD-9D0DB1BAF908@continuum.io> On May 12, 2012, at 9:01 PM, Matthew Brett wrote: > Hi, > > On Sat, May 12, 2012 at 5:30 PM, Travis Oliphant wrote: >> Your thoughts are definitely the future. We are currently building such a thing. We would like it to be open source. We are currently preparing a proposal to DARPA as part of their XDATA proposal in order to help fund this. Email me offlist if you would like to be a part of this proposal. You don't have to be a U.S. citizen to participate in this. > > What if this work that you are doing now is not open-source? What > relationship will it have to numpy? Anything DARPA funds will be open source if we are lucky enough to get them to support it our vision. I'm not sure the exact relationship to current NumPy at this point. That's why I suggested the need to discuss this on a different venue than this list. But, this is just me personally interested in something at this point. > > It's difficult to have a productive discussion on the list if the main > work is going on elsewhere, and we don't know what it is. I'm not sure how to make sense of this statement. The *main* work of *NumPy* is happening on this list. Nothing should be construed in what I have said to indicate otherwise. My statement that "we are currently building such a thing" obviously does not apply to NumPy. Of course work on other things might happen elsewhere and in other ways. At this point, all contributors to NumPy also work on other things as far as I know. Best, -Travis From travis at continuum.io Sat May 12 22:28:41 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 12 May 2012 21:28:41 -0500 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: Another approach would be to introduce a method: a.diag(copy=False) and leave a.diagonal() alone. Then, a.diagonal() could be deprecated over 2-3 releases. -Travis On May 12, 2012, at 8:31 AM, Ralf Gommers wrote: > > > On Sat, May 12, 2012 at 1:54 AM, Nathaniel Smith wrote: > On Fri, May 11, 2012 at 9:26 PM, T J wrote: > > On Fri, May 11, 2012 at 1:12 PM, Mark Wiebe wrote: > >> > >> On Fri, May 11, 2012 at 2:18 PM, Pauli Virtanen wrote: > >>> > >>> 11.05.2012 17:54, Fr?d?ric Bastien kirjoitti: > >>> > In Theano we use a view, but that is not relevant as it is the > >>> > compiler that tell what is inplace. So this is invisible to the user. > >>> > > >>> > What about a parameter to diagonal() that default to return a view not > >>> > writable as you said. The user can then choose what it want and this > >>> > don't break the inferface. > >>> [clip] > >>> > >>> Agreed, it seems this is the appropriate way to go on here > >>> `diagonal(copy=True)`. A more obscure alternative would be to add a > >>> separate method that returns a view. > >> > >> > >> This looks like the best way to deal with it, yes. > >> > >> Cheers, > >> Mark > >> > >>> > >>> > >>> I don't think changing the default behavior in a later release is a good > >>> idea. It's a sort of an API wart, but IMHO better that than subtle code > >>> breakage. > >>> > >>> > > > > copy=True seems fine, but is this the final plan? What about long term, > > should diag() eventually be brought in line with transpose() and reshape() > > so that it is a view by default? Changing default behavior is certainly not > > something that should be done all the time, but it *can* be done if > > deprecated appropriately. A more consistent API is better than one with > > warts (if this particular issue is actually seen as a wart). > > This is my question as well. Adding copy=True default argument is > certainly a fine solution for 1.7, but IMHO in the long run it would > be better for diagonal() to return a view by default. > > If you want to get to a situation where the behavior is changed, adding a temporary new keyword is not a good solution in general. Because this forces you to make the change over 3 different releases: > 1. introduce copy=True > 2. change to copy=False > 3. remove copy kw > and step 3 is still breaking existing code, because people started using "copy=False". > > See the histogram new_behavior keyword as an example of this. > > (Aside from it > seeming generally more "numpythonic", I see from auditing the code I > have lying around my homedir that it would generally be a free speed > win, and having to remember to type a.diagonal(copy=False) all the > time in order to get full performance seems a bit annoying.) > > I mean, I'm all for conservatism in these things, which is why I > raised the issue in the first place :-). But it also seems like there > should be *some* mechanism for getting there from here (assuming > others agree we want to). There's been grumblings about trying to do > more evolving of numpy in-place instead of putting everything off to > the legendary 2.0, right? > > Unfortunately just putting a deprecation warning on everyone who calls > diagonal() without an explicit copy= argument seems like it would be > *really* obnoxious, though. If necessary we could get more creative... > add a special-purpose ndarray flag like WARN_ON_WRITE so that code > which writes to the returned array continues to work like now, but > also triggers a deprecation warning? I dunno. > > Something like this could be a solution. Otherwise, just living with the copy is imho much better than introducing a copy kw. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat May 12 22:39:59 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 12 May 2012 19:39:59 -0700 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <6C602666-3ACC-450B-B2FD-9D0DB1BAF908@continuum.io> References: <4FACF458.6070200@astro.uio.no> <4FAECA3A.8090604@astro.uio.no> <1C373464-83AA-4F73-8260-70ACD75006DB@continuum.io> <6C602666-3ACC-450B-B2FD-9D0DB1BAF908@continuum.io> Message-ID: Hi, On Sat, May 12, 2012 at 7:24 PM, Travis Oliphant wrote: > > On May 12, 2012, at 9:01 PM, Matthew Brett wrote: > >> Hi, >> >> On Sat, May 12, 2012 at 5:30 PM, Travis Oliphant wrote: >>> Your thoughts are definitely the future. ? We are currently building such a thing. ? We would like it to be open source. ? ?We are currently preparing a proposal to DARPA as part of their XDATA proposal in order to help fund this. ? ?Email me offlist if you would like to be a part of this proposal. ? ?You don't have to be a U.S. citizen to participate in this. >> >> What if this work that you are doing now is not open-source? ?What >> relationship will it have to numpy? > > Anything DARPA funds will be open source if we are lucky enough to get them to support it our vision. ? I'm not sure the exact relationship to current NumPy at this point. ? ?That's why I suggested the need to discuss this on a different venue than this list. ? ?But, this is just me personally interested in something at this point. > >> >> It's difficult to have a productive discussion on the list if the main >> work is going on elsewhere, and we don't know what it is. > > I'm not sure how to make sense of this statement. ? ?The *main* work of *NumPy* is happening on this list. ? Nothing should be construed in what I have said to indicate otherwise. ? My statement that "we are currently building such a thing" obviously does not apply to NumPy. > > Of course work on other things might happen elsewhere and in other ways. ? ?At this point, all contributors to NumPy also work on other things as far as I know. In your email above you said "Your thoughts are definitely the future. ? We are currently building such a thing. ? We would like it to be open source." I assume you meant, the future of numpy. Did you mean something else? Will there be a 'numpy-pro'? If so, how would that relate to the future of numpy? See you, Matthew From orion at cora.nwra.com Sat May 12 22:47:03 2012 From: orion at cora.nwra.com (Orion Poplawski) Date: Sat, 12 May 2012 20:47:03 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: <4FAF20A7.1010004@cora.nwra.com> On 05/05/2012 12:15 PM, Ralf Gommers wrote: > Hi, > > I'm pleased to announce the availability of the first release candidate > of NumPy 1.6.2. This is a maintenance release. Due to the delay of the > NumPy 1.7.0, this release contains far more fixes than a regular NumPy > bugfix release. It also includes a number of documentation and build > improvements. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ > > Please test this release and report any issues on the numpy-discussion > mailing list. Built fine in Fedora Rawhide (F18). Ran 3210 tests in 17.036s OK (KNOWNFAIL=3, SKIP=1) Running unit tests for numpy NumPy version 1.6.2rc1 NumPy is installed in /builddir/build/BUILDROOT/numpy-1.6.2-0.1.rc1.fc18.x86_64/usr/lib64/python2.7/site-packages/numpy Python version 2.7.3 (default, Apr 30 2012, 20:31:33) [GCC 4.7.0 20120416 (Red Hat 4.7.0-2)] nose version 1.1.2 -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion at cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com From ben.root at ou.edu Sat May 12 22:50:25 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 12 May 2012 22:50:25 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Saturday, May 12, 2012, Travis Oliphant wrote: > Another approach would be to introduce a method: > > a.diag(copy=False) > > and leave a.diagonal() alone. Then, a.diagonal() could be deprecated over > 2-3 releases. > > -Travis > +1 Ben Root > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat May 12 23:24:30 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 12 May 2012 22:24:30 -0500 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> <4FAECA3A.8090604@astro.uio.no> <1C373464-83AA-4F73-8260-70ACD75006DB@continuum.io> <6C602666-3ACC-450B-B2FD-9D0DB1BAF908@continuum.io> Message-ID: <10A8D402-6709-4436-A795-551D2C297751@continuum.io> On May 12, 2012, at 9:39 PM, Matthew Brett wrote: > Hi, > > On Sat, May 12, 2012 at 7:24 PM, Travis Oliphant wrote: >> >> On May 12, 2012, at 9:01 PM, Matthew Brett wrote: >> >>> Hi, >>> >>> On Sat, May 12, 2012 at 5:30 PM, Travis Oliphant wrote: >>>> Your thoughts are definitely the future. We are currently building such a thing. We would like it to be open source. We are currently preparing a proposal to DARPA as part of their XDATA proposal in order to help fund this. Email me offlist if you would like to be a part of this proposal. You don't have to be a U.S. citizen to participate in this. >>> >>> What if this work that you are doing now is not open-source? What >>> relationship will it have to numpy? >> >> Anything DARPA funds will be open source if we are lucky enough to get them to support it our vision. I'm not sure the exact relationship to current NumPy at this point. That's why I suggested the need to discuss this on a different venue than this list. But, this is just me personally interested in something at this point. >> >>> >>> It's difficult to have a productive discussion on the list if the main >>> work is going on elsewhere, and we don't know what it is. >> >> I'm not sure how to make sense of this statement. The *main* work of *NumPy* is happening on this list. Nothing should be construed in what I have said to indicate otherwise. My statement that "we are currently building such a thing" obviously does not apply to NumPy. >> >> Of course work on other things might happen elsewhere and in other ways. At this point, all contributors to NumPy also work on other things as far as I know. > > In your email above you said "Your thoughts are definitely the future. > We are currently building such a thing. We would like it to be > open source." I assume you meant, the future of numpy. Did you mean > something else? Will there be a 'numpy-pro'? If so, how would that > relate to the future of numpy? > I think I've clarified that I meant something else in that context. Any statement about any rumored "NumPy-Pro" is unrelated and completely off-topic (and inappropriate to discuss on this list). Of course there could be such a thing. Several people have produced things like that before including Enthought (with it's MKL-linked NumPy) and Interactive SuperComputing (starp). I think it would be great for NumPy if there were a lot of such offerings. My preference for such proprietary products is that they eventually become open source which should be pretty clear to anyone paying attention to the way I sold my book "Guide to NumPy" --- which I might add is currently freely available and became a large primary source for the NumPy Documentation project that Joe Harrington spearheaded. But such discussions are really not appropriate for this list. I would really hope that we can keep the NumPy discussion list on topic to NumPy. Obviously there will be the occasional "announcement" that describe related but extra-list topics (like Dag posts about Cython and others post about SymPy or SciPy or NumFOCUS, or, conferences, or whatever). But they should not really create threads of discussion. A great place for such general discussions (and perhaps discussions about "governance" or "process") is probably a NumFOCUS mailing list. NumFOCUS is trying to promote the Open Source tools generally and is in very early stages of organization with the by-laws not even written yet. I suspect that the board of NumFOCUS would be excited for people to get involved and help organize and participate in forums and mailings lists that discuss how to help the open source scientific stack for Python progress. The best way to get involved there is to go to www.numfocus.org and email info at numfocus.org Best, -Travis From seb.haase at gmail.com Sun May 13 03:46:17 2012 From: seb.haase at gmail.com (Sebastian Haase) Date: Sun, 13 May 2012 09:46:17 +0200 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: +1 On Sun, May 13, 2012 at 4:28 AM, Travis Oliphant wrote: > Another approach would be to introduce a method: > > a.diag(copy=False) > > and leave a.diagonal() alone. ?Then, a.diagonal() could be deprecated over > 2-3 releases. > > -Travis > > > On May 12, 2012, at 8:31 AM, Ralf Gommers wrote: > > > > On Sat, May 12, 2012 at 1:54 AM, Nathaniel Smith wrote: >> >> On Fri, May 11, 2012 at 9:26 PM, T J wrote: >> > On Fri, May 11, 2012 at 1:12 PM, Mark Wiebe wrote: >> >> >> >> On Fri, May 11, 2012 at 2:18 PM, Pauli Virtanen wrote: >> >>> >> >>> 11.05.2012 17:54, Fr?d?ric Bastien kirjoitti: >> >>> > In Theano we use a view, but that is not relevant as it is the >> >>> > compiler that tell what is inplace. So this is invisible to the >> >>> > user. >> >>> > >> >>> > What about a parameter to diagonal() that default to return a view >> >>> > not >> >>> > writable as you said. The user can then choose what it want and this >> >>> > don't break the inferface. >> >>> [clip] >> >>> >> >>> Agreed, it seems this is the appropriate way to go on here >> >>> `diagonal(copy=True)`. A more obscure alternative would be to add a >> >>> separate method that returns a view. >> >> >> >> >> >> This looks like the best way to deal with it, yes. >> >> >> >> Cheers, >> >> Mark >> >> >> >>> >> >>> >> >>> I don't think changing the default behavior in a later release is a >> >>> good >> >>> idea. It's a sort of an API wart, but IMHO better that than subtle >> >>> code >> >>> breakage. >> >>> >> >>> >> > >> > copy=True seems fine, but is this the final plan? ? What about long >> > term, >> > should diag() eventually be brought in line with transpose() and >> > reshape() >> > so that it is a view by default? ?Changing default behavior is certainly >> > not >> > something that should be done all the time, but it *can* be done if >> > deprecated appropriately. ?A more consistent API is better than one with >> > warts (if this particular issue is actually seen as a wart). >> >> This is my question as well. Adding copy=True default argument is >> certainly a fine solution for 1.7, but IMHO in the long run it would >> be better for diagonal() to return a view by default. > > > If you want to get to a situation where the behavior is changed, adding a > temporary new keyword is not a good solution in general. Because this forces > you to make the change over 3 different releases: > ? 1. introduce copy=True > ? 2. change to copy=False > ? 3. remove copy kw > and step 3 is still breaking existing code, because people started using > "copy=False". > > See the histogram new_behavior keyword as an example of this. > >> (Aside from it >> seeming generally more "numpythonic", I see from auditing the code I >> have lying around my homedir that it would generally be a free speed >> win, and having to remember to type a.diagonal(copy=False) all the >> time in order to get full performance seems a bit annoying.) >> >> I mean, I'm all for conservatism in these things, which is why I >> raised the issue in the first place :-). But it also seems like there >> should be *some* mechanism for getting there from here (assuming >> others agree we want to). There's been grumblings about trying to do >> more evolving of numpy in-place instead of putting everything off to >> the legendary 2.0, right? >> >> Unfortunately just putting a deprecation warning on everyone who calls >> diagonal() without an explicit copy= argument seems like it would be >> *really* obnoxious, though. If necessary we could get more creative... >> add a special-purpose ndarray flag like WARN_ON_WRITE so that code >> which writes to the returned array continues to work like now, but >> also triggers a deprecation warning? I dunno. > > > Something like this could be a solution. Otherwise, just living with the > copy is imho much better than introducing a copy kw. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Sun May 13 04:11:18 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 May 2012 09:11:18 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Sun, May 13, 2012 at 3:28 AM, Travis Oliphant wrote: > Another approach would be to introduce a method: > > a.diag(copy=False) > > and leave a.diagonal() alone. ?Then, a.diagonal() could be deprecated over > 2-3 releases. This would be a good idea if we didn't already have both np.diagonal(a) (which is an alias for a.diagonal()) *and* np.diag(a), which do different things. And the new a.diag() would be different from the existing np.diag(a)... -- Nathaniel From paul.anton.letnes at gmail.com Sun May 13 07:14:33 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 13 May 2012 13:14:33 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sat, May 12, 2012 at 9:50 PM, Ralf Gommers wrote: > > > On Sun, May 6, 2012 at 12:12 AM, Charles R Harris > wrote: >> >> >> >> On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes >> wrote: >>> >>> Hi, >>> >>> I'm getting a couple of errors when testing. System: >>> Arch Linux (updated today) >>> Python 3.2.3 >>> gcc 4.7.0 >>> (Anything else?) >>> >>> I think that this error: >>> AssertionError: selectedrealkind(19): expected -1 but got 16 >>> is due to the fact that newer versions of gfortran actually supports >>> precision this high (quad precision). >>> >> >> Yes, but it should be fixed. I can't duplicate this here with a fresh >> checkout of the branch. > > > This failure makes no sense to me. > > Error comes from this code: > > ??? 'selectedrealkind(%s): expected %r but got %r' %? (i, > selected_real_kind(i), selectedrealkind(i))) > > So "selected_real_kind(19)" returns -1. > > selected_real_kind is the function > numpy.f2py.crackfortran._selected_real_kind_func, which is defined as: > > def _selected_real_kind_func(p, r=0, radix=0): > ??? #XXX: This should be processor dependent > ??? # This is only good for 0 <= p <= 20 > ??? if p < 7: return 4 > ??? if p < 16: return 8 > ??? if platform.machine().lower().startswith('power'): > ??????? if p <= 20: > ??????????? return 16 > ??? else: > ??????? if p < 19: > ??????????? return 10 > ??????? elif p <= 20: > ??????????? return 16 > ??? return -1 > > For p=19 this function should always return 16. So the result from compiling > foo.f90 is fine, but the test is broken in a very strange way. > > Paul, is the failure reproducible on your machine? If so, can you try to > debug it? > > Ralf Hi Ralf. The Arch numpy (1.6.1) for python 2.7, installed via pacman (the package manager) has this problem. After installation of numpy 1.6.2rc1 in a virtualenv, the test passes. Maybe the bug was fixed in the RC, and I screwed up which numpy version I tested? I'm sorry that I can't find out - I just built a new machine, and the old one is lying around the livingroom in pieces. Was that particular bit of code changed between 1.6.1 and 1.6.2rc1? Paul From dprasad830 at yahoo.com Sun May 13 12:48:32 2012 From: dprasad830 at yahoo.com (Dinesh Prasad) Date: Sun, 13 May 2012 09:48:32 -0700 (PDT) Subject: [Numpy-discussion] Correlation code from "NumPy 1.5 Beginner's Guide" Message-ID: <1336927712.88558.YahooMailNeo@web161201.mail.bf1.yahoo.com> Hello. I am new to the list thanks for accepting my question. ? I am trying to run the attached code, directly from the book in the title. ? It simply calculates correlation of returns of the stock listed in the spreadsheets. could it be that the numPY library is not being recognized on my system for some reason? ? I installed the 32 bit version of python/numpy. This is to preserve compatibility with another application that will trigger the code. I am on Windows 7 64 bit Home Edition, however. ? Thanks for any suggestions. ? -Dinesh -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: correlation.py URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: BHP.csv Type: application/vnd.ms-excel Size: 1481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: VALE.csv Type: application/vnd.ms-excel Size: 1549 bytes Desc: not available URL: From scopatz at gmail.com Sun May 13 19:28:30 2012 From: scopatz at gmail.com (Anthony Scopatz) Date: Sun, 13 May 2012 18:28:30 -0500 Subject: [Numpy-discussion] fromstring() is slow, no really! Message-ID: Hello All, This week, while doing some optimization, I found that np.fromstring() is significantly slower than many alternatives out there. This function basically does two things: (1) it splits the string and (2) it converts the data to the desired type. There isn't much we can do about the conversion/casting so what I mean is that the *string splitting implementation is slow*. To simplify the discussion, I will just talk about string to 1d float64 arrays. I have also issued pull request #279 [1] to numpy with some sample code. Timings can be seen in the ipython notebook here. It turns out that using str.split() and np.array() are 20 - 35% faster, which was non-intuitive to me. That is to say: rawdata = s.split() data = np.array(rawdata, dtype=float) is faster than data = np.fromstring(s, sep=" ", dtype=float) The next thing to try, naturally, was Cython. This did not change the timings much for these two strategies. However, being in Cython allows us to call atof() directly. My implementation is based on a previous thread on this topic [2]. However, in the example in [2], the string was hard coded, contained only one data value, and did not need to be split. Thus they saw a dramatic 10x speed boost. To deal with the more realistic case, I first just continued to use str.split(). This took 35 - 50% less time than np.fromstring(). Finally, using the strtok() function in the C standard library to call atof() while we tokenize the string further reduces the speed 50 - 60% of the baseline np.fromstring() time. Timings ------------ In [1]: import fromstr In [2]: s = "100.0 " * 100000 In [3]: timeit fromstr.fromstring(s) 10 loops, best of 3: 20.7 ms per loop In [4]: timeit fromstr.split_and_array(s) 100 loops, best of 3: 16.1 ms per loop In [6]: timeit fromstr.split_atof(s) 100 loops, best of 3: 13.5 ms per loop In [7]: timeit fromstr.token_atof(s) 100 loops, best of 3: 8.35 ms per loop Possible Explanation ---------------------------------- Numpy's fromstring() function may be found here [3]. However, this code is a bit hard to follow but it uses the array_from_text() function [4]. On the other hand str.split() [5] uses a macro function SPLIT_ADD(). The difference between these is that I believe that str.split() over-allocates the size of the list in a more aggressive way than array_from_text(). This leads to fewer resizes and thus fewer memory copies. This would also explain why the tokenize implementation is the fastest since this pre-allocates the maximum possible array size and then slices it down. No resizes are present in this function, though it requires more memory up front. Summary (tl;dr) ------------------------ The np.fromstring() is slow in the mechanism it chooses to split strings by. This is likely due to how many resize operations it must perform. While it need not be the* *fastest* *thing out there, it should probably be at least as fast at Python string splitting. No pull-request 'fixing' this issue was provided because I wanted to see what people thought and if / which option is worth pursuing. Be Well Anthony [1] https://github.com/numpy/numpy/pull/279 [2] http://comments.gmane.org/gmane.comp.python.numeric.general/41504 [3] https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L3699 [4] https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L3418 [5] http://svn.python.org/view/python/tags/r271/Objects/stringlib/split.h?view=markup -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Sun May 13 19:34:32 2012 From: scopatz at gmail.com (Anthony Scopatz) Date: Sun, 13 May 2012 18:34:32 -0500 Subject: [Numpy-discussion] fromstring() is slow, no really! In-Reply-To: References: Message-ID: And I forgot to attach the relevant code (though it is also in my fork)... On Sun, May 13, 2012 at 6:28 PM, Anthony Scopatz wrote: > Hello All, > > This week, while doing some optimization, I found that np.fromstring() > is significantly slower than many alternatives out there. This function > basically does two things: (1) it splits the string and (2) it converts the > data to the desired type. > > There isn't much we can do about the conversion/casting so what I > mean is that the *string splitting implementation is slow*. > > To simplify the discussion, I will just talk about string to 1d float64 > arrays. > I have also issued pull request #279 [1] to numpy with some sample code. > Timings can be seen in the ipython notebook here. > > It turns out that using str.split() and np.array() are 20 - 35% faster, > which > was non-intuitive to me. That is to say: > > rawdata = s.split() > data = np.array(rawdata, dtype=float) > > > is faster than > > data = np.fromstring(s, sep=" ", dtype=float) > > > The next thing to try, naturally, was Cython. This did not change the > timings much for these two strategies. However, being in Cython > allows us to call atof() directly. My implementation is based on a > previous > thread on this topic [2]. However, in the example in [2], the string was > hard coded, contained only one data value, and did not need to be split. > Thus they saw a dramatic 10x speed boost. To deal with the more > realistic case, I first just continued to use str.split(). This took 35 - > 50% > less time than np.fromstring(). > > Finally, using the strtok() function in the C standard library to call > atof() > while we tokenize the string further reduces the speed 50 - 60% of the > baseline np.fromstring() time. > > Timings > ------------ > In [1]: import fromstr > > In [2]: s = "100.0 " * 100000 > > In [3]: timeit fromstr.fromstring(s) > 10 loops, best of 3: 20.7 ms per loop > > In [4]: timeit fromstr.split_and_array(s) > 100 loops, best of 3: 16.1 ms per loop > > In [6]: timeit fromstr.split_atof(s) > 100 loops, best of 3: 13.5 ms per loop > > In [7]: timeit fromstr.token_atof(s) > 100 loops, best of 3: 8.35 ms per loop > > Possible Explanation > ---------------------------------- > Numpy's fromstring() function may be found here [3]. However, this code > is a bit hard to follow but it uses the array_from_text() function [4]. > On the > other hand str.split() [5] uses a macro function SPLIT_ADD(). The > difference > between these is that I believe that str.split() over-allocates the size > of the > list in a more aggressive way than array_from_text(). This leads to fewer > resizes and thus fewer memory copies. > > This would also explain why the tokenize implementation is the fastest > since > this pre-allocates the maximum possible array size and then slices it > down. > No resizes are present in this function, though it requires more memory up > front. > > Summary (tl;dr) > ------------------------ > The np.fromstring() is slow in the mechanism it chooses to split strings > by. > This is likely due to how many resize operations it must perform. While > it > need not be the* *fastest* *thing out there, it should probably be at > least as > fast at Python string splitting. > > No pull-request 'fixing' this issue was provided because I wanted to see > what people thought and if / which option is worth pursuing. > > Be Well > Anthony > > [1] https://github.com/numpy/numpy/pull/279 > [2] http://comments.gmane.org/gmane.comp.python.numeric.general/41504 > [3] > https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L3699 > [4] > https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L3418 > [5] > http://svn.python.org/view/python/tags/r271/Objects/stringlib/split.h?view=markup > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fromstr.pyx Type: application/octet-stream Size: 1530 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.py Type: application/octet-stream Size: 251 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fromstr.ipynb Type: application/octet-stream Size: 3436 bytes Desc: not available URL: From n.becker at amolf.nl Mon May 14 04:13:03 2012 From: n.becker at amolf.nl (Nils Becker) Date: Mon, 14 May 2012 10:13:03 +0200 Subject: [Numpy-discussion] histogram2d and histogramdd return counts as floats while histogram returns ints In-Reply-To: <4FAD4254.50903@amolf.nl> References: <4FAD4254.50903@amolf.nl> Message-ID: <4FB0BE8F.80308@amolf.nl> > is this intended? > > np.histogramdd([[1,2],[3,4]],bins=2) > > (array([[ 1., 0.], > [ 0., 1.]]), > [array([ 1. , 1.5, 2. ]), array([ 3. , 3.5, 4. ])]) > > np.histogram2d([1,2],[3,4],bins=2) > > (array([[ 1., 0.], > [ 0., 1.]]), > array([ 1. , 1.5, 2. ]), > array([ 3. , 3.5, 4. ])) > > np.histogram([1,2],bins=2) > > (array([1, 1]), array([ 1. , 1.5, 2. ])) maybe i should have been more explicit. what i meant to say is that 1. the counts in a histogram are integers. whenever no normalization is used i would expect that i get an integer array when i call a histogram function. 2. now it might be intended that the data type is always the same so that float has to be used to accomodate the normalized histograms. 3. in any case, the 1d histogram function handles this differently from the 2d and dd ones. this seems inconsistent and might be considered a bug. n. From markflorisson88 at gmail.com Mon May 14 12:31:47 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 14 May 2012 17:31:47 +0100 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FAEDC58.8050502@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> Message-ID: On 12 May 2012 22:55, Dag Sverre Seljebotn wrote: > On 05/11/2012 03:37 PM, mark florisson wrote: >> >> On 11 May 2012 12:13, Dag Sverre Seljebotn >> ?wrote: >>> >>> (NumPy devs: I know, I get too many ideas. But this time I *really* >>> believe >>> in it, I think this is going to be *huge*. And if Mark F. likes it it's >>> not >>> going to be without manpower; and as his mentor I'd pitch in too here and >>> there.) >>> >>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly >>> don't >>> want to micro-manage your GSoC, just have your take.) >>> >>> Travis, thank you very much for those good words in the "NA-mask >>> interactions..." thread. It put most of my concerns away. If anybody is >>> leaning towards for opaqueness because of its OOP purity, I want to refer >>> to >>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >>> different OOP array libraries, neither of which is able to out-compete >>> the >>> other. Meanwhile the rest of the world happily cooperates using pointers, >>> strides, CSR and CSC. >>> >>> Now, there are limits to what you can do with strides and pointers. >>> Noone's >>> denying the need for more. In my mind that's an API where you can do >>> fetch_block and put_block of cache-sized, N-dimensional blocks on an >>> array; >>> but it might be something slightly different. >>> >>> Here's what I'm asking: DO NOT simply keep extending ndarray and the >>> NumPy C >>> API to deal with this issue. >>> >>> What we need is duck-typing/polymorphism at the C level. If you keep >>> extending ndarray and the NumPy C API, what we'll have is a one-to-many >>> relationship: One provider of array technology, multiple consumers (with >>> hooks, I'm sure, but all implementations of the hook concept in the NumPy >>> world I've seen so far are a total disaster!). >>> >>> What I think we need instead is something like PEP 3118 for the >>> "abstract" >>> array that is only available block-wise with getters and setters. On the >>> Cython list we've decided that what we want for CEP 1000 (for boxing >>> callbacks etc.) is to extend PyTypeObject with our own fields; we could >>> create CEP 1001 to solve this issue and make any Python object an >>> exporter >>> of "block-getter/setter-arrays" (better name needed). >>> >>> What would be exported is (of course) a simple vtable: >>> >>> typedef struct { >>> ? ?int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, >>> ...); >>> ? ?... >>> } block_getter_setter_array_vtable; >>> >>> Let's please discuss the details *after* the fundamentals. But the reason >>> I >>> put void* there instead of PyObject* is that I hope this could be used >>> beyond the Python world (say, Python<->Julia); the void* would be handed >>> to >>> you at the time you receive the vtable (however we handle that). >> >> >> I suppose it would also be useful to have some way of predicting the >> output format polymorphically for the caller. E.g. dense * >> block_diagonal results in block diagonal, but dense + block_diagonal >> results in dense, etc. It might be useful for the caller to know >> whether it needs to allocate a sparse, dense or block-structured >> array. Or maybe the polymorphic function could even do the allocation. >> This needs to happen recursively of course, to avoid intermediate >> temporaries. The compiler could easily handle that, and so could numpy >> when it gets lazy evaluation. > > > Ah. But that depends too on the computation to be performed too; a) > elementwise, b) axis-wise reductions, c) linear algebra... > > In my oomatrix code (please don't look at it, it's shameful) I do this using > multiple dispatch. > > I'd rather ignore this for as long as we can, only implementing "a[:] = ..." > -- I can't see how decisions here would trickle down to the API that's used > in the kernel, it's more like a pre-phase, and better treated orthogonally. > > >> I think if the heavy lifting of allocating output arrays and exporting >> these arrays work in numpy, then support in Cython could use that (I >> can already hear certain people object to more complicated array stuff >> in Cython :). Even better here would be an external project that each >> our projects could use (I still think the nditer sorting functionality >> of arrays should be numpy-agnostic and externally available). > > > I agree with the separate project idea. It's trivial for NumPy to > incorporate that as one of its methods for exporting arrays, and I don't > think it makes sense to either build it into Cython, or outright depend on > NumPy. > > Here's what I'd like (working title: NumBridge?). > > ?- Mission: Be the "double* + shape + strides" in a world where that is no > longer enough, by providing tight, focused APIs/ABIs that are usable across > C/Fortran/Python. > > I basically want something I can quickly acquire from a NumPy array, then > pass it into my C code without dragging along all the cruft that I don't > need. > > ?- Written in pure C + specs, usable without Python > > ?- PEP 3118 "done right", basically semi-standardize the internal Cython > memoryview ABI and get something that's passable on stack > > ?- Get block get/put API > > ?- Iterator APIs > > ?- Utility code for exporters and clients (iteration code, axis reordering, > etc.) > > Is the scope of that insane, or is it at least worth a shot to see how bad > it is? Beyond figuring out a small subset that can be done first, and > whether performance considerations must be taken or not, there's two > complicating factors: Pluggable dtypes, memory management. Perhaps you could > come to Oslo for a couple of days to brainstorm... > > Dag The blocks are a good idea, but I think fairly complicated for complicated matrix layouts. It would be nice to have something that is reasonably efficient for at least most of the array storage mechanisms. I'm going to do a little brain dump below, let's see if anything is useful :) What if we basically take the CSR format and make it a little simpler, easier to handle, and better suited for other layouts. Basically, keep shape/strides for dense arrays, and for blocked arrays just "extend" your number of dimensions, i.e. a 2D blocked array becomes a 4D array, something like this: >>> a = np.arange(4).repeat(4).reshape(4, 4); >>> a array([[0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]]) >>> a.shape = (2, 2, 2, 2) >>> itemsize = a.dtype.itemsize >>> a.strides = (8 * itemsize, 2 * itemsize, 4 * itemsize, 1 * itemsize) >>> a array([[[[0, 0], [1, 1]], [[0, 0], [1, 1]]], [[[2, 2], [3, 3]], [[2, 2], [3, 3]]]]) >>> print a.flatten() [0 0 1 1 0 0 1 1 2 2 3 3 2 2 3 3] Now, given some buffer flag (PyBUF_Sparse or something), use basically suboffsets with indirect dimensions, where only ever the last dimension is a row of contiguous memory (the entire thing may be contiguous, but the point is that you don't care). This row may - be variable sized - have either a single "column start" (e.g. diagonal, band/tri- diagonal, block diagonal, etc), OR - a list of column indices, the length of the row (like in the CSR format) The length of each innermost row is then determined by looking at, in order: - the extent as specified in the shape list - if -1, and some flag is set, the extent is determined like CSR, i.e. (uintptr_t) row[i + 1] - (uintptr_t) row[i] -> maybe instead of pointers indices are better, for serialization, GPUs, etc - otherwise, use either a function pointer or perhaps a list of extents All these details will obviously be abstracted, allowing for easy iteration, but it can also be used by ufuncs operating on contiguous rows (often, since the actual storage is contiguous and stored in a 1D array, some flag could indicate contiguity as an optimization for unary ufuncs and flat iteration). Tiled nditer-ation could also work here without too big a hassle. When you slice, you add to the suboffset and manipulate the single extent or entire list of extents in that dimension, and the flag to determine the length using the pointer subtraction should be cleared (this should probably all happen through vtable functions). An exporter would also be free to use different malloced pointers, allowing variable sized array support with append/pop functionality in multiple dimensions (if there are no active buffer views). Random access in the case where a column start is provided is still contant time, and done though a[i][j][k - colstart], unless you have discontiguous rows, in which case you are allowed a logarithmic search (if the size exceeds some threshold). I see scipy.sparse does a linear search, which is pretty slow in pathological cases: from scipy import sparse a = np.random.random((1, 4000000)) b = sparse.csr_matrix(a) %timeit a[0, 1000] %timeit b[0, 1000] 10000000 loops, best of 3: 190 ns per loop 10 loops, best of 3: 29.3 ms per loop Now, depending on the density and operation,?the caller may have some idea of how to allocate an output array. I'm not sure how to handle "insertions" of new elements, but I presume through vtable put/insert functions. I'm also not sure how to fit this in with linear algebra functionality, other than providing conversions of the view. I think a disadvantage of this scheme is that you can't reorder your axes anymore, and many operations that are easy in the dense case are suddenly harder, e.g. this scheme does not allow you to go from a csr-like format into csc. But I think what this gives is reasonable generality to allow easy use in C/Fortran/Cython compiled code/numpy ufunc invocation, as well as allowing efficient-ish storage for various kinds of arrays. Does this make any sense? From rhattersley at gmail.com Mon May 14 15:24:09 2012 From: rhattersley at gmail.com (Richard Hattersley) Date: Mon, 14 May 2012 20:24:09 +0100 Subject: [Numpy-discussion] Missing data wrap-up and request for comments In-Reply-To: References: Message-ID: For what it's worth, I'd prefer ndmasked. As has been mentioned elsewhere, some algorithms can't really cope with missing data. I'd very much rather they fail than silently give incorrect results. Working in the climate prediction business (as with many other domains I'm sure), even the *potential* for incorrect results can be damaging. On 11 May 2012 06:14, Travis Oliphant wrote: > > On May 10, 2012, at 12:21 AM, Charles R Harris wrote: > > > > On Wed, May 9, 2012 at 11:05 PM, Benjamin Root wrote: > >> >> >> On Wednesday, May 9, 2012, Nathaniel Smith wrote: >> >>> >>> >>> My only objection to this proposal is that committing to this approach >>> seems premature. The existing masked array objects act quite >>> differently from numpy.ma, so why do you believe that they're a good >>> foundation for numpy.ma, and why will users want to switch to their >>> semantics over numpy.ma's semantics? These aren't rhetorical >>> questions, it seems like they must have concrete answers, but I don't >>> know what they are. >>> >> >> Based on the design decisions made in the original NEP, a re-made >> numpy.ma would have to lose _some_ features particularly, the ability to >> share masks. Save for that and some very obscure behaviors that are >> undocumented, it is possible to remake numpy.ma as a compatibility layer. >> >> That being said, I think that there are some fundamental questions that >> has concerned. If I recall, there were unresolved questions about behaviors >> surrounding assignments to elements of a view. >> >> I see the project as broken down like this: >> 1.) internal architecture (largely abi issues) >> 2.) external architecture (hooks throughout numpy to utilize the new >> features where possible such as where= argument) >> 3.) getter/setter semantics >> 4.) mathematical semantics >> >> At this moment, I think we have pieces of 2 and they are fairly >> non-controversial. It is 1 that I see as being the immediate hold-up here. >> 3 & 4 are non-trivial, but because they are mostly about interfaces, I >> think we can be willing to accept some very basic, fundamental, barebones >> components here in order to lay the groundwork for a more complete API >> later. >> >> To talk of Travis's proposal, doing nothing is no-go. Not moving forward >> would dishearten the community. Making a ndmasked type is very intriguing. >> I see it as a set towards eventually deprecating ndarray? Also, how would >> it behave with no.asarray() and no.asanyarray()? My other concern is a >> possible violation of DRY. How difficult would it be to maintain two >> ndarrays in parallel? >> >> As for the flag approach, this still doesn't solve the problem of legacy >> code (or did I misunderstand?) >> > > My understanding of the flag is to allow the code to stay in and get > reworked and experimented with while keeping it from contaminating > conventional use. > > The whole point of putting the code in was to experiment and adjust. The > rather bizarre idea that it needs to be perfect from the get go is > disheartening, and is seldom how new things get developed. Sure, there is a > plan up front, but there needs to be feedback and change. And in fact, I > haven't seen much feedback about the actual code, I don't even know that > the people complaining have tried using it to see where it hurts. I'd like > that sort of feedback. > > > I don't think anyone is saying it needs to be perfect from the get go. > What I am saying is that this is fundamental enough to downstream users > that this kind of thing is best done as a separate object. The flag could > still be used to make all Python-level array constructors build ndmasked > objects. > > But, this doesn't address the C-level story where there is quite a bit of > downstream use where people have used the NumPy array as just a pointer to > memory without considering that there might be a mask attached that should > be inspected as well. > > The NEP addresses this a little bit for those C or C++ consumers of the > ndarray in C who always use PyArray_FromAny which can fail if the array has > non-NULL mask contents. However, it is *not* true that all downstream > users use PyArray_FromAny. > > A large number of users just use something like PyArray_Check and then > PyArray_DATA to get the pointer to the data buffer and then go from there > thinking of their data as a strided memory chunk only (no extra mask). > The NEP fundamentally changes this simple invariant that has been in NumPy > and Numeric before it for a long, long time. > > I really don't see how we can do this in a 1.7 release. It has too many > unknown and I think unknowable downstream effects. But, I think we could > introduce another arrayobject that is the masked_array with a Python-level > flag that makes it the default array in Python. > > There are a few more subtleties, PyArray_Check by default will pass > sub-classes so if the new ndmask array were a sub-class then it would be > passed (just like current numpy.ma arrays and matrices would pass that > check today). However, there is a PyArray_CheckExact macro which could > be used to ensure the object was actually of PyArray_Type. There is also > the PyArg_ParseTuple command with "O!" that I have seen used many times to > ensure an exact NumPy array. > > -Travis > > > > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon May 14 15:47:25 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 14 May 2012 21:47:25 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Sun, May 13, 2012 at 1:14 PM, Paul Anton Letnes < paul.anton.letnes at gmail.com> wrote: > On Sat, May 12, 2012 at 9:50 PM, Ralf Gommers > wrote: > > > > > > On Sun, May 6, 2012 at 12:12 AM, Charles R Harris > > wrote: > >> > >> > >> > >> On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes > >> wrote: > >>> > >>> Hi, > >>> > >>> I'm getting a couple of errors when testing. System: > >>> Arch Linux (updated today) > >>> Python 3.2.3 > >>> gcc 4.7.0 > >>> (Anything else?) > >>> > >>> I think that this error: > >>> AssertionError: selectedrealkind(19): expected -1 but got 16 > >>> is due to the fact that newer versions of gfortran actually supports > >>> precision this high (quad precision). > >>> > >> > >> Yes, but it should be fixed. I can't duplicate this here with a fresh > >> checkout of the branch. > > > > > > This failure makes no sense to me. > > > > Error comes from this code: > > > > 'selectedrealkind(%s): expected %r but got %r' % (i, > > selected_real_kind(i), selectedrealkind(i))) > > > > So "selected_real_kind(19)" returns -1. > > > > selected_real_kind is the function > > numpy.f2py.crackfortran._selected_real_kind_func, which is defined as: > > > > def _selected_real_kind_func(p, r=0, radix=0): > > #XXX: This should be processor dependent > > # This is only good for 0 <= p <= 20 > > if p < 7: return 4 > > if p < 16: return 8 > > if platform.machine().lower().startswith('power'): > > if p <= 20: > > return 16 > > else: > > if p < 19: > > return 10 > > elif p <= 20: > > return 16 > > return -1 > > > > For p=19 this function should always return 16. So the result from > compiling > > foo.f90 is fine, but the test is broken in a very strange way. > > > > Paul, is the failure reproducible on your machine? If so, can you try to > > debug it? > > > > Ralf > > Hi Ralf. > > The Arch numpy (1.6.1) for python 2.7, installed via pacman (the > package manager) has this problem. > > After installation of numpy 1.6.2rc1 in a virtualenv, the test passes. > Maybe the bug was fixed in the RC, and I screwed up which numpy > version I tested? I'm sorry that I can't find out - I just built a new > machine, and the old one is lying around the livingroom in pieces. Was > that particular bit of code changed between 1.6.1 and 1.6.2rc1? > It was actually, in https://github.com/numpy/numpy/commit/e7f2210e1. So you tested 1.6.1 by accident before, and it's working now? Problem solved in that case. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Mon May 14 16:36:09 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 14 May 2012 22:36:09 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> Message-ID: <4FB16CB9.9020507@astro.uio.no> On 05/14/2012 06:31 PM, mark florisson wrote: > On 12 May 2012 22:55, Dag Sverre Seljebotn wrote: >> On 05/11/2012 03:37 PM, mark florisson wrote: >>> >>> On 11 May 2012 12:13, Dag Sverre Seljebotn >>> wrote: >>>> >>>> (NumPy devs: I know, I get too many ideas. But this time I *really* >>>> believe >>>> in it, I think this is going to be *huge*. And if Mark F. likes it it's >>>> not >>>> going to be without manpower; and as his mentor I'd pitch in too here and >>>> there.) >>>> >>>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly >>>> don't >>>> want to micro-manage your GSoC, just have your take.) >>>> >>>> Travis, thank you very much for those good words in the "NA-mask >>>> interactions..." thread. It put most of my concerns away. If anybody is >>>> leaning towards for opaqueness because of its OOP purity, I want to refer >>>> to >>>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >>>> different OOP array libraries, neither of which is able to out-compete >>>> the >>>> other. Meanwhile the rest of the world happily cooperates using pointers, >>>> strides, CSR and CSC. >>>> >>>> Now, there are limits to what you can do with strides and pointers. >>>> Noone's >>>> denying the need for more. In my mind that's an API where you can do >>>> fetch_block and put_block of cache-sized, N-dimensional blocks on an >>>> array; >>>> but it might be something slightly different. >>>> >>>> Here's what I'm asking: DO NOT simply keep extending ndarray and the >>>> NumPy C >>>> API to deal with this issue. >>>> >>>> What we need is duck-typing/polymorphism at the C level. If you keep >>>> extending ndarray and the NumPy C API, what we'll have is a one-to-many >>>> relationship: One provider of array technology, multiple consumers (with >>>> hooks, I'm sure, but all implementations of the hook concept in the NumPy >>>> world I've seen so far are a total disaster!). >>>> >>>> What I think we need instead is something like PEP 3118 for the >>>> "abstract" >>>> array that is only available block-wise with getters and setters. On the >>>> Cython list we've decided that what we want for CEP 1000 (for boxing >>>> callbacks etc.) is to extend PyTypeObject with our own fields; we could >>>> create CEP 1001 to solve this issue and make any Python object an >>>> exporter >>>> of "block-getter/setter-arrays" (better name needed). >>>> >>>> What would be exported is (of course) a simple vtable: >>>> >>>> typedef struct { >>>> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, >>>> ...); >>>> ... >>>> } block_getter_setter_array_vtable; >>>> >>>> Let's please discuss the details *after* the fundamentals. But the reason >>>> I >>>> put void* there instead of PyObject* is that I hope this could be used >>>> beyond the Python world (say, Python<->Julia); the void* would be handed >>>> to >>>> you at the time you receive the vtable (however we handle that). >>> >>> >>> I suppose it would also be useful to have some way of predicting the >>> output format polymorphically for the caller. E.g. dense * >>> block_diagonal results in block diagonal, but dense + block_diagonal >>> results in dense, etc. It might be useful for the caller to know >>> whether it needs to allocate a sparse, dense or block-structured >>> array. Or maybe the polymorphic function could even do the allocation. >>> This needs to happen recursively of course, to avoid intermediate >>> temporaries. The compiler could easily handle that, and so could numpy >>> when it gets lazy evaluation. >> >> >> Ah. But that depends too on the computation to be performed too; a) >> elementwise, b) axis-wise reductions, c) linear algebra... >> >> In my oomatrix code (please don't look at it, it's shameful) I do this using >> multiple dispatch. >> >> I'd rather ignore this for as long as we can, only implementing "a[:] = ..." >> -- I can't see how decisions here would trickle down to the API that's used >> in the kernel, it's more like a pre-phase, and better treated orthogonally. >> >> >>> I think if the heavy lifting of allocating output arrays and exporting >>> these arrays work in numpy, then support in Cython could use that (I >>> can already hear certain people object to more complicated array stuff >>> in Cython :). Even better here would be an external project that each >>> our projects could use (I still think the nditer sorting functionality >>> of arrays should be numpy-agnostic and externally available). >> >> >> I agree with the separate project idea. It's trivial for NumPy to >> incorporate that as one of its methods for exporting arrays, and I don't >> think it makes sense to either build it into Cython, or outright depend on >> NumPy. >> >> Here's what I'd like (working title: NumBridge?). >> >> - Mission: Be the "double* + shape + strides" in a world where that is no >> longer enough, by providing tight, focused APIs/ABIs that are usable across >> C/Fortran/Python. >> >> I basically want something I can quickly acquire from a NumPy array, then >> pass it into my C code without dragging along all the cruft that I don't >> need. >> >> - Written in pure C + specs, usable without Python >> >> - PEP 3118 "done right", basically semi-standardize the internal Cython >> memoryview ABI and get something that's passable on stack >> >> - Get block get/put API >> >> - Iterator APIs >> >> - Utility code for exporters and clients (iteration code, axis reordering, >> etc.) >> >> Is the scope of that insane, or is it at least worth a shot to see how bad >> it is? Beyond figuring out a small subset that can be done first, and >> whether performance considerations must be taken or not, there's two >> complicating factors: Pluggable dtypes, memory management. Perhaps you could >> come to Oslo for a couple of days to brainstorm... >> >> Dag > > The blocks are a good idea, but I think fairly complicated for > complicated matrix layouts. It would be nice to have something that is > reasonably efficient for at least most of the array storage > mechanisms. > I'm going to do a little brain dump below, let's see if anything is useful :) > > What if we basically take the CSR format and make it a little simpler, > easier to handle, and better suited for other layouts. Basically, keep > shape/strides for dense arrays, and for blocked arrays just "extend" > your number of dimensions, i.e. a 2D blocked array becomes a 4D array, > something like this: > >>>> a = np.arange(4).repeat(4).reshape(4, 4); >>>> a > array([[0, 0, 0, 0], > [1, 1, 1, 1], > [2, 2, 2, 2], > [3, 3, 3, 3]]) > >>>> a.shape = (2, 2, 2, 2) >>>> itemsize = a.dtype.itemsize >>>> a.strides = (8 * itemsize, 2 * itemsize, 4 * itemsize, 1 * itemsize) >>>> a > array([[[[0, 0], > [1, 1]], > > [[0, 0], > [1, 1]]], > > > [[[2, 2], > [3, 3]], > > [[2, 2], > [3, 3]]]]) > >>>> print a.flatten() > [0 0 1 1 0 0 1 1 2 2 3 3 2 2 3 3] > > Now, given some buffer flag (PyBUF_Sparse or something), use basically > suboffsets with indirect dimensions, where only ever the last > dimension is a row of contiguous memory (the entire thing may be > contiguous, but the point is that you don't care). This row may > > - be variable sized > - have either a single "column start" (e.g. diagonal, band/tri- > diagonal, block diagonal, etc), OR > - a list of column indices, the length of the row (like in the CSR format) > > The length of each innermost row is then determined by looking at, in order: > - the extent as specified in the shape list > - if -1, and some flag is set, the extent is determined like CSR, > i.e. (uintptr_t) row[i + 1] - (uintptr_t) row[i] > -> maybe instead of pointers indices are better, for > serialization, GPUs, etc > - otherwise, use either a function pointer or perhaps a list of extents > > All these details will obviously be abstracted, allowing for easy > iteration, but it can also be used by ufuncs operating on contiguous > rows (often, since the actual storage is contiguous and stored in a 1D > array, some flag could indicate contiguity as an optimization for > unary ufuncs and flat iteration). Tiled nditer-ation could also work > here without too big a hassle. > When you slice, you add to the suboffset and manipulate the single > extent or entire list of extents in that dimension, and the flag to > determine the length using the pointer subtraction should be cleared > (this should probably all happen through vtable functions). > > An exporter would also be free to use different malloced pointers, > allowing variable sized array support with append/pop functionality in > multiple dimensions (if there are no active buffer views). > > Random access in the case where a column start is provided is still > contant time, and done though a[i][j][k - colstart], unless you have > discontiguous rows, in which case you are allowed a logarithmic search > (if the size exceeds some threshold). I see scipy.sparse does a linear > search, which is pretty slow in pathological cases: > > from scipy import sparse > a = np.random.random((1, 4000000)) > b = sparse.csr_matrix(a) > %timeit a[0, 1000] > %timeit b[0, 1000] > > 10000000 loops, best of 3: 190 ns per loop > 10 loops, best of 3: 29.3 ms per loop Heh. That is *extremely* pathological though, nobody does that in real code :-) Here's an idea I had yesterday: To get an ND sparse array with good spatial locality (for local operations etc.), you could map the elements to a volume-filling fractal and then use a hash-map with linear probing. I bet it either doesn't work or has been done already :-) > > Now, depending on the density and operation, the caller may have some > idea of how to allocate an output array. I'm not sure how to handle > "insertions" of new elements, but I presume through vtable put/insert > functions. I'm also not sure how to fit this in with linear algebra > functionality, other than providing conversions of the view. > > I think a disadvantage of this scheme is that you can't reorder your > axes anymore, and many operations that are easy in the dense case are > suddenly harder, e.g. this scheme does not allow you to go from a > csr-like format into csc. But I think what this gives is reasonable > generality to allow easy use in C/Fortran/Cython compiled code/numpy > ufunc invocation, as well as allowing efficient-ish storage for > various kinds of arrays. > > Does this make any sense? I'll admit I didn't go through the finer details of your idea; let's deal with the below first and then I can re-read your post later. What I'm thinking is that this seems interesting, but perhaps it lowers the versatility so much that it's not really worth to consider, for the GSoC at least. If the impact isn't high enough, my hunch is that one may as well not bother and just do strided arrays. I actually believe that the *likely* outcome of this discussion is to stick to the original plan and focus on expressions with strided arrays. But let's see. I'm not sure if what brought me to the blocked case was really block-sparse arrays or diagonal arrays. Let's see...: 1) Memory conservation. The array would not be NOT element-sparse, it's just that you don't want to hold it all in memory at once. Examples: - "a_array *= np.random.normal(size=a_array.shape)". The right hand side can be generated on the fly (making it return the same data for each block every time is non-trivial but can be done). If a takes up a good chunk of RAM, there might not even be enough memory for the right-hand-side except for block-by-block. (That saves bandwidth too.) - You want to stream from one file to another file, neither of which will fit in RAM. Here we really don't care about speed, just code reuse...it's irritating to have to manually block in such a situation. - You want to play with a new fancy array format (a blocked format that's faster for some linear algebra, say). But then you need to call another C library that takes data+shape+stride+dtype. Then you need to make a copy, which you'd rather avoid -- so an alternative is to make that C library be based on the block API so that it can interfaces with your fancy new format (and anything else). 2) Bandwidth conservation. Numexpr, Theano, Numba and Cython-vectorized-expressions all already deal with this problem on one level. However, I also believe there's a higher level in many programs where blocks come into play. The organization of many scientific codes essentially reads "do A on 8 GB of data, then do B on 8 GB of data"; and it's going to be a *long* time before you have full-program analysis to block up that in every case; the tools above will be able to deal with some almost-trivial cases only. A block API could be used to write "pipeline programs". For instance, consider a_array[:, None] * f(x_array) and f is some rather complicated routine in Fortran that is NOT a ufunc -- it takes all of x_array as input and provides all the output "at once"; but at some point it needs to do the writing of the output, and if writing to an API it could do the multiplication with a_array while the block is in cache anyway and save a memory bus round-trip. (Provided the output isn't needed as scratch space.) Summing up: Vectorized expressions on contiguous (or strided) memory in Cython is pretty nice by itself; it would bring us up to the current state of the art of static compilation (in Fortran compilers). But my hunch is your sparse-block proposal doesn't add enough flexibility to be worth the pain. Many of the cases above can't be covered with it. If it's possible to identify a little nugget API that is forward-compatible with the above usecases (it wouldn't solve them perhaps, but allow them to be solved with some extra supporting code), then it might be worth to a) implement it in NumPy, b) support it for Cython vectorized expressions; and use that to support block-transposition. But I'm starting to feel that we can't really know how that little nugget API should really look until the above has been explored in a little more depth; and then we are easily talking a too large scope without tying it to a larger project (which can't really before the GSoC..). For instance, 2) suggests push/pull rather than put/get. Dag From d.s.seljebotn at astro.uio.no Mon May 14 16:39:50 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 14 May 2012 22:39:50 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FB16CB9.9020507@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> <4FB16CB9.9020507@astro.uio.no> Message-ID: <4FB16D96.4020307@astro.uio.no> On 05/14/2012 10:36 PM, Dag Sverre Seljebotn wrote: > On 05/14/2012 06:31 PM, mark florisson wrote: >> On 12 May 2012 22:55, Dag Sverre Seljebotn wrote: >>> On 05/11/2012 03:37 PM, mark florisson wrote: >>>> >>>> On 11 May 2012 12:13, Dag Sverre Seljebotn >>>> wrote: >>>>> >>>>> (NumPy devs: I know, I get too many ideas. But this time I *really* >>>>> believe >>>>> in it, I think this is going to be *huge*. And if Mark F. likes it it's >>>>> not >>>>> going to be without manpower; and as his mentor I'd pitch in too here and >>>>> there.) >>>>> >>>>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly >>>>> don't >>>>> want to micro-manage your GSoC, just have your take.) >>>>> >>>>> Travis, thank you very much for those good words in the "NA-mask >>>>> interactions..." thread. It put most of my concerns away. If anybody is >>>>> leaning towards for opaqueness because of its OOP purity, I want to refer >>>>> to >>>>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >>>>> different OOP array libraries, neither of which is able to out-compete >>>>> the >>>>> other. Meanwhile the rest of the world happily cooperates using pointers, >>>>> strides, CSR and CSC. >>>>> >>>>> Now, there are limits to what you can do with strides and pointers. >>>>> Noone's >>>>> denying the need for more. In my mind that's an API where you can do >>>>> fetch_block and put_block of cache-sized, N-dimensional blocks on an >>>>> array; >>>>> but it might be something slightly different. >>>>> >>>>> Here's what I'm asking: DO NOT simply keep extending ndarray and the >>>>> NumPy C >>>>> API to deal with this issue. >>>>> >>>>> What we need is duck-typing/polymorphism at the C level. If you keep >>>>> extending ndarray and the NumPy C API, what we'll have is a one-to-many >>>>> relationship: One provider of array technology, multiple consumers (with >>>>> hooks, I'm sure, but all implementations of the hook concept in the NumPy >>>>> world I've seen so far are a total disaster!). >>>>> >>>>> What I think we need instead is something like PEP 3118 for the >>>>> "abstract" >>>>> array that is only available block-wise with getters and setters. On the >>>>> Cython list we've decided that what we want for CEP 1000 (for boxing >>>>> callbacks etc.) is to extend PyTypeObject with our own fields; we could >>>>> create CEP 1001 to solve this issue and make any Python object an >>>>> exporter >>>>> of "block-getter/setter-arrays" (better name needed). >>>>> >>>>> What would be exported is (of course) a simple vtable: >>>>> >>>>> typedef struct { >>>>> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, >>>>> ...); >>>>> ... >>>>> } block_getter_setter_array_vtable; >>>>> >>>>> Let's please discuss the details *after* the fundamentals. But the reason >>>>> I >>>>> put void* there instead of PyObject* is that I hope this could be used >>>>> beyond the Python world (say, Python<->Julia); the void* would be handed >>>>> to >>>>> you at the time you receive the vtable (however we handle that). >>>> >>>> >>>> I suppose it would also be useful to have some way of predicting the >>>> output format polymorphically for the caller. E.g. dense * >>>> block_diagonal results in block diagonal, but dense + block_diagonal >>>> results in dense, etc. It might be useful for the caller to know >>>> whether it needs to allocate a sparse, dense or block-structured >>>> array. Or maybe the polymorphic function could even do the allocation. >>>> This needs to happen recursively of course, to avoid intermediate >>>> temporaries. The compiler could easily handle that, and so could numpy >>>> when it gets lazy evaluation. >>> >>> >>> Ah. But that depends too on the computation to be performed too; a) >>> elementwise, b) axis-wise reductions, c) linear algebra... >>> >>> In my oomatrix code (please don't look at it, it's shameful) I do this using >>> multiple dispatch. >>> >>> I'd rather ignore this for as long as we can, only implementing "a[:] = ..." >>> -- I can't see how decisions here would trickle down to the API that's used >>> in the kernel, it's more like a pre-phase, and better treated orthogonally. >>> >>> >>>> I think if the heavy lifting of allocating output arrays and exporting >>>> these arrays work in numpy, then support in Cython could use that (I >>>> can already hear certain people object to more complicated array stuff >>>> in Cython :). Even better here would be an external project that each >>>> our projects could use (I still think the nditer sorting functionality >>>> of arrays should be numpy-agnostic and externally available). >>> >>> >>> I agree with the separate project idea. It's trivial for NumPy to >>> incorporate that as one of its methods for exporting arrays, and I don't >>> think it makes sense to either build it into Cython, or outright depend on >>> NumPy. >>> >>> Here's what I'd like (working title: NumBridge?). >>> >>> - Mission: Be the "double* + shape + strides" in a world where that is no >>> longer enough, by providing tight, focused APIs/ABIs that are usable across >>> C/Fortran/Python. >>> >>> I basically want something I can quickly acquire from a NumPy array, then >>> pass it into my C code without dragging along all the cruft that I don't >>> need. >>> >>> - Written in pure C + specs, usable without Python >>> >>> - PEP 3118 "done right", basically semi-standardize the internal Cython >>> memoryview ABI and get something that's passable on stack >>> >>> - Get block get/put API >>> >>> - Iterator APIs >>> >>> - Utility code for exporters and clients (iteration code, axis reordering, >>> etc.) >>> >>> Is the scope of that insane, or is it at least worth a shot to see how bad >>> it is? Beyond figuring out a small subset that can be done first, and >>> whether performance considerations must be taken or not, there's two >>> complicating factors: Pluggable dtypes, memory management. Perhaps you could >>> come to Oslo for a couple of days to brainstorm... >>> >>> Dag >> >> The blocks are a good idea, but I think fairly complicated for >> complicated matrix layouts. It would be nice to have something that is >> reasonably efficient for at least most of the array storage >> mechanisms. >> I'm going to do a little brain dump below, let's see if anything is useful :) >> >> What if we basically take the CSR format and make it a little simpler, >> easier to handle, and better suited for other layouts. Basically, keep >> shape/strides for dense arrays, and for blocked arrays just "extend" >> your number of dimensions, i.e. a 2D blocked array becomes a 4D array, >> something like this: >> >>>>> a = np.arange(4).repeat(4).reshape(4, 4); >>>>> a >> array([[0, 0, 0, 0], >> [1, 1, 1, 1], >> [2, 2, 2, 2], >> [3, 3, 3, 3]]) >> >>>>> a.shape = (2, 2, 2, 2) >>>>> itemsize = a.dtype.itemsize >>>>> a.strides = (8 * itemsize, 2 * itemsize, 4 * itemsize, 1 * itemsize) >>>>> a >> array([[[[0, 0], >> [1, 1]], >> >> [[0, 0], >> [1, 1]]], >> >> >> [[[2, 2], >> [3, 3]], >> >> [[2, 2], >> [3, 3]]]]) >> >>>>> print a.flatten() >> [0 0 1 1 0 0 1 1 2 2 3 3 2 2 3 3] >> >> Now, given some buffer flag (PyBUF_Sparse or something), use basically >> suboffsets with indirect dimensions, where only ever the last >> dimension is a row of contiguous memory (the entire thing may be >> contiguous, but the point is that you don't care). This row may >> >> - be variable sized >> - have either a single "column start" (e.g. diagonal, band/tri- >> diagonal, block diagonal, etc), OR >> - a list of column indices, the length of the row (like in the CSR format) >> >> The length of each innermost row is then determined by looking at, in order: >> - the extent as specified in the shape list >> - if -1, and some flag is set, the extent is determined like CSR, >> i.e. (uintptr_t) row[i + 1] - (uintptr_t) row[i] >> -> maybe instead of pointers indices are better, for >> serialization, GPUs, etc >> - otherwise, use either a function pointer or perhaps a list of extents >> >> All these details will obviously be abstracted, allowing for easy >> iteration, but it can also be used by ufuncs operating on contiguous >> rows (often, since the actual storage is contiguous and stored in a 1D >> array, some flag could indicate contiguity as an optimization for >> unary ufuncs and flat iteration). Tiled nditer-ation could also work >> here without too big a hassle. >> When you slice, you add to the suboffset and manipulate the single >> extent or entire list of extents in that dimension, and the flag to >> determine the length using the pointer subtraction should be cleared >> (this should probably all happen through vtable functions). >> >> An exporter would also be free to use different malloced pointers, >> allowing variable sized array support with append/pop functionality in >> multiple dimensions (if there are no active buffer views). >> >> Random access in the case where a column start is provided is still >> contant time, and done though a[i][j][k - colstart], unless you have >> discontiguous rows, in which case you are allowed a logarithmic search >> (if the size exceeds some threshold). I see scipy.sparse does a linear >> search, which is pretty slow in pathological cases: >> >> from scipy import sparse >> a = np.random.random((1, 4000000)) >> b = sparse.csr_matrix(a) >> %timeit a[0, 1000] >> %timeit b[0, 1000] >> >> 10000000 loops, best of 3: 190 ns per loop >> 10 loops, best of 3: 29.3 ms per loop > > Heh. That is *extremely* pathological though, nobody does that in real > code :-) > > Here's an idea I had yesterday: To get an ND sparse array with good > spatial locality (for local operations etc.), you could map the elements > to a volume-filling fractal and then use a hash-map with linear probing. > I bet it either doesn't work or has been done already :-) > >> >> Now, depending on the density and operation, the caller may have some >> idea of how to allocate an output array. I'm not sure how to handle >> "insertions" of new elements, but I presume through vtable put/insert >> functions. I'm also not sure how to fit this in with linear algebra >> functionality, other than providing conversions of the view. >> >> I think a disadvantage of this scheme is that you can't reorder your >> axes anymore, and many operations that are easy in the dense case are >> suddenly harder, e.g. this scheme does not allow you to go from a >> csr-like format into csc. But I think what this gives is reasonable >> generality to allow easy use in C/Fortran/Cython compiled code/numpy >> ufunc invocation, as well as allowing efficient-ish storage for >> various kinds of arrays. >> >> Does this make any sense? > > I'll admit I didn't go through the finer details of your idea; let's > deal with the below first and then I can re-read your post later. > > What I'm thinking is that this seems interesting, but perhaps it lowers > the versatility so much that it's not really worth to consider, for the > GSoC at least. If the impact isn't high enough, my hunch is that one may > as well not bother and just do strided arrays. > > I actually believe that the *likely* outcome of this discussion is to > stick to the original plan and focus on expressions with strided arrays. > But let's see. > > I'm not sure if what brought me to the blocked case was really > block-sparse arrays or diagonal arrays. Let's see...: > > 1) Memory conservation. The array would not be NOT element-sparse, it's > just that you don't want to hold it all in memory at once. > > Examples: > > - "a_array *= np.random.normal(size=a_array.shape)". The right hand > side can be generated on the fly (making it return the same data for > each block every time is non-trivial but can be done). If a takes up a > good chunk of RAM, there might not even be enough memory for the > right-hand-side except for block-by-block. (That saves bandwidth too.) > > - You want to stream from one file to another file, neither of which > will fit in RAM. Here we really don't care about speed, just code > reuse...it's irritating to have to manually block in such a situation. Actually, s/file/database/g, just to avoid comments about np.memmap. Dag > > - You want to play with a new fancy array format (a blocked format > that's faster for some linear algebra, say). But then you need to call > another C library that takes data+shape+stride+dtype. Then you need to > make a copy, which you'd rather avoid -- so an alternative is to make > that C library be based on the block API so that it can interfaces with > your fancy new format (and anything else). > > 2) Bandwidth conservation. Numexpr, Theano, Numba and > Cython-vectorized-expressions all already deal with this problem on one > level. > > However, I also believe there's a higher level in many programs where > blocks come into play. The organization of many scientific codes > essentially reads "do A on 8 GB of data, then do B on 8 GB of data"; and > it's going to be a *long* time before you have full-program analysis to > block up that in every case; the tools above will be able to deal with > some almost-trivial cases only. > > A block API could be used to write "pipeline programs". For instance, > consider > > a_array[:, None] * f(x_array) > > and f is some rather complicated routine in Fortran that is NOT a ufunc > -- it takes all of x_array as input and provides all the output "at > once"; but at some point it needs to do the writing of the output, and > if writing to an API it could do the multiplication with a_array while > the block is in cache anyway and save a memory bus round-trip. (Provided > the output isn't needed as scratch space.) > > Summing up: > > Vectorized expressions on contiguous (or strided) memory in Cython is > pretty nice by itself; it would bring us up to the current state of the > art of static compilation (in Fortran compilers). > > But my hunch is your sparse-block proposal doesn't add enough > flexibility to be worth the pain. Many of the cases above can't be > covered with it. > > If it's possible to identify a little nugget API that is > forward-compatible with the above usecases (it wouldn't solve them > perhaps, but allow them to be solved with some extra supporting code), > then it might be worth to a) implement it in NumPy, b) support it for > Cython vectorized expressions; and use that to support block-transposition. > > But I'm starting to feel that we can't really know how that little > nugget API should really look until the above has been explored in a > little more depth; and then we are easily talking a too large scope > without tying it to a larger project (which can't really before the > GSoC..). For instance, 2) suggests push/pull rather than put/get. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pmhobson at gmail.com Mon May 14 16:41:45 2012 From: pmhobson at gmail.com (Paul Hobson) Date: Mon, 14 May 2012 13:41:45 -0700 Subject: [Numpy-discussion] Correlation code from "NumPy 1.5 Beginner's Guide" In-Reply-To: <1336927712.88558.YahooMailNeo@web161201.mail.bf1.yahoo.com> References: <1336927712.88558.YahooMailNeo@web161201.mail.bf1.yahoo.com> Message-ID: On Sun, May 13, 2012 at 9:48 AM, Dinesh Prasad wrote: > Hello. I am new to the list thanks for accepting my question. > > I am trying to run the attached code, directly from the book in the title. > > It simply calculates correlation of returns of the stock listed in the > spreadsheets. could it be that the numPY library is not being recognized on > my system for some reason? > > I installed the 32 bit version of python/numpy. This is to preserve > compatibility with another application that will trigger the code. I am on > Windows 7 64 bit Home Edition, however. > > Thanks for any suggestions. > > -Dinesh Dinesh, What errors are you getting? I didn't run your code, but from the looks of things, you might want to set unpack=False as you read the data. In other words, instead of this: >>> bhp = numpy.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=True) use this: >>> bhp = numpy.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=False) The "unpack" keyword argument makes the np.loadtxt function try to dump each column into its own variable. It may support dumping the data to a single tuple, but I don't remember that being the case. -paul From cournape at gmail.com Mon May 14 16:54:09 2012 From: cournape at gmail.com (David Cournapeau) Date: Mon, 14 May 2012 21:54:09 +0100 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> Message-ID: On Mon, May 14, 2012 at 5:31 PM, mark florisson wrote: > On 12 May 2012 22:55, Dag Sverre Seljebotn > wrote: > > On 05/11/2012 03:37 PM, mark florisson wrote: > >> > >> On 11 May 2012 12:13, Dag Sverre Seljebotn > >> wrote: > >>> > >>> (NumPy devs: I know, I get too many ideas. But this time I *really* > >>> believe > >>> in it, I think this is going to be *huge*. And if Mark F. likes it it's > >>> not > >>> going to be without manpower; and as his mentor I'd pitch in too here > and > >>> there.) > >>> > >>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly > >>> don't > >>> want to micro-manage your GSoC, just have your take.) > >>> > >>> Travis, thank you very much for those good words in the "NA-mask > >>> interactions..." thread. It put most of my concerns away. If anybody is > >>> leaning towards for opaqueness because of its OOP purity, I want to > refer > >>> to > >>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 > >>> different OOP array libraries, neither of which is able to out-compete > >>> the > >>> other. Meanwhile the rest of the world happily cooperates using > pointers, > >>> strides, CSR and CSC. > >>> > >>> Now, there are limits to what you can do with strides and pointers. > >>> Noone's > >>> denying the need for more. In my mind that's an API where you can do > >>> fetch_block and put_block of cache-sized, N-dimensional blocks on an > >>> array; > >>> but it might be something slightly different. > >>> > >>> Here's what I'm asking: DO NOT simply keep extending ndarray and the > >>> NumPy C > >>> API to deal with this issue. > >>> > >>> What we need is duck-typing/polymorphism at the C level. If you keep > >>> extending ndarray and the NumPy C API, what we'll have is a one-to-many > >>> relationship: One provider of array technology, multiple consumers > (with > >>> hooks, I'm sure, but all implementations of the hook concept in the > NumPy > >>> world I've seen so far are a total disaster!). > >>> > >>> What I think we need instead is something like PEP 3118 for the > >>> "abstract" > >>> array that is only available block-wise with getters and setters. On > the > >>> Cython list we've decided that what we want for CEP 1000 (for boxing > >>> callbacks etc.) is to extend PyTypeObject with our own fields; we could > >>> create CEP 1001 to solve this issue and make any Python object an > >>> exporter > >>> of "block-getter/setter-arrays" (better name needed). > >>> > >>> What would be exported is (of course) a simple vtable: > >>> > >>> typedef struct { > >>> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t > *lower_right, > >>> ...); > >>> ... > >>> } block_getter_setter_array_vtable; > >>> > >>> Let's please discuss the details *after* the fundamentals. But the > reason > >>> I > >>> put void* there instead of PyObject* is that I hope this could be used > >>> beyond the Python world (say, Python<->Julia); the void* would be > handed > >>> to > >>> you at the time you receive the vtable (however we handle that). > >> > >> > >> I suppose it would also be useful to have some way of predicting the > >> output format polymorphically for the caller. E.g. dense * > >> block_diagonal results in block diagonal, but dense + block_diagonal > >> results in dense, etc. It might be useful for the caller to know > >> whether it needs to allocate a sparse, dense or block-structured > >> array. Or maybe the polymorphic function could even do the allocation. > >> This needs to happen recursively of course, to avoid intermediate > >> temporaries. The compiler could easily handle that, and so could numpy > >> when it gets lazy evaluation. > > > > > > Ah. But that depends too on the computation to be performed too; a) > > elementwise, b) axis-wise reductions, c) linear algebra... > > > > In my oomatrix code (please don't look at it, it's shameful) I do this > using > > multiple dispatch. > > > > I'd rather ignore this for as long as we can, only implementing "a[:] = > ..." > > -- I can't see how decisions here would trickle down to the API that's > used > > in the kernel, it's more like a pre-phase, and better treated > orthogonally. > > > > > >> I think if the heavy lifting of allocating output arrays and exporting > >> these arrays work in numpy, then support in Cython could use that (I > >> can already hear certain people object to more complicated array stuff > >> in Cython :). Even better here would be an external project that each > >> our projects could use (I still think the nditer sorting functionality > >> of arrays should be numpy-agnostic and externally available). > > > > > > I agree with the separate project idea. It's trivial for NumPy to > > incorporate that as one of its methods for exporting arrays, and I don't > > think it makes sense to either build it into Cython, or outright depend > on > > NumPy. > > > > Here's what I'd like (working title: NumBridge?). > > > > - Mission: Be the "double* + shape + strides" in a world where that is > no > > longer enough, by providing tight, focused APIs/ABIs that are usable > across > > C/Fortran/Python. > > > > I basically want something I can quickly acquire from a NumPy array, then > > pass it into my C code without dragging along all the cruft that I don't > > need. > > > > - Written in pure C + specs, usable without Python > > > > - PEP 3118 "done right", basically semi-standardize the internal Cython > > memoryview ABI and get something that's passable on stack > > > > - Get block get/put API > > > > - Iterator APIs > > > > - Utility code for exporters and clients (iteration code, axis > reordering, > > etc.) > > > > Is the scope of that insane, or is it at least worth a shot to see how > bad > > it is? Beyond figuring out a small subset that can be done first, and > > whether performance considerations must be taken or not, there's two > > complicating factors: Pluggable dtypes, memory management. Perhaps you > could > > come to Oslo for a couple of days to brainstorm... > > > > Dag > > The blocks are a good idea, but I think fairly complicated for > complicated matrix layouts. It would be nice to have something that is > reasonably efficient for at least most of the array storage > mechanisms. > I'm going to do a little brain dump below, let's see if anything is useful > :) > > What if we basically take the CSR format and make it a little simpler, > easier to handle, and better suited for other layouts. Basically, keep > shape/strides for dense arrays, and for blocked arrays just "extend" > your number of dimensions, i.e. a 2D blocked array becomes a 4D array, > something like this: > > >>> a = np.arange(4).repeat(4).reshape(4, 4); > >>> a > array([[0, 0, 0, 0], > [1, 1, 1, 1], > [2, 2, 2, 2], > [3, 3, 3, 3]]) > > >>> a.shape = (2, 2, 2, 2) > >>> itemsize = a.dtype.itemsize > >>> a.strides = (8 * itemsize, 2 * itemsize, 4 * itemsize, 1 * itemsize) > >>> a > array([[[[0, 0], > [1, 1]], > > [[0, 0], > [1, 1]]], > > > [[[2, 2], > [3, 3]], > > [[2, 2], > [3, 3]]]]) > > >>> print a.flatten() > [0 0 1 1 0 0 1 1 2 2 3 3 2 2 3 3] > > Now, given some buffer flag (PyBUF_Sparse or something), use basically > suboffsets with indirect dimensions, where only ever the last > dimension is a row of contiguous memory (the entire thing may be > contiguous, but the point is that you don't care). This row may > > - be variable sized > - have either a single "column start" (e.g. diagonal, band/tri- > diagonal, block diagonal, etc), OR > - a list of column indices, the length of the row (like in the CSR > format) > > The length of each innermost row is then determined by looking at, in > order: > - the extent as specified in the shape list > - if -1, and some flag is set, the extent is determined like CSR, > i.e. (uintptr_t) row[i + 1] - (uintptr_t) row[i] > -> maybe instead of pointers indices are better, for > serialization, GPUs, etc > - otherwise, use either a function pointer or perhaps a list of extents > > All these details will obviously be abstracted, allowing for easy > iteration, but it can also be used by ufuncs operating on contiguous > rows (often, since the actual storage is contiguous and stored in a 1D > array, some flag could indicate contiguity as an optimization for > unary ufuncs and flat iteration). Tiled nditer-ation could also work > here without too big a hassle. > When you slice, you add to the suboffset and manipulate the single > extent or entire list of extents in that dimension, and the flag to > determine the length using the pointer subtraction should be cleared > (this should probably all happen through vtable functions). > > An exporter would also be free to use different malloced pointers, > allowing variable sized array support with append/pop functionality in > multiple dimensions (if there are no active buffer views). > > Random access in the case where a column start is provided is still > contant time, and done though a[i][j][k - colstart], unless you have > discontiguous rows, in which case you are allowed a logarithmic search > (if the size exceeds some threshold). I see scipy.sparse does a linear > search, which is pretty slow in pathological cases: > > from scipy import sparse > a = np.random.random((1, 4000000)) > b = sparse.csr_matrix(a) > %timeit a[0, 1000] > %timeit b[0, 1000] > > 10000000 loops, best of 3: 190 ns per loop > 10 loops, best of 3: 29.3 ms per loop > > Now, depending on the density and operation, the caller may have some > idea of how to allocate an output array. I'm not sure how to handle > "insertions" of new elements, but I presume through vtable put/insert > functions. I'm also not sure how to fit this in with linear algebra > functionality, other than providing conversions of the view. > > I think a disadvantage of this scheme is that you can't reorder your > axes anymore, and many operations that are easy in the dense case are > suddenly harder, e.g. this scheme does not allow you to go from a > csr-like format into csc. But I think what this gives is reasonable > generality to allow easy use in C/Fortran/Cython compiled code/numpy > ufunc invocation, as well as allowing efficient-ish storage for > various kinds of arrays. > > Does this make any sense? > I think it does, although it is not clear to me how it would generalize to more than 2 dimensions (i.e. how would you handle a 3d sparse array). Would you add more level of indirection ? I have been thinking a bit about related concepts in the context of generalized sparse arrays (which could be a first step toward numpy arrays on top of multiple malloc blocks instead of just one), and one of the solution I have seen is generalized b-tree/b*-trees. The main appeal to me is that the underlying storage is still one-dimensional, and the "magic" happens "only" in the indexing. This has several advantages: - getting the low level block algorithms rights is a solved problem - it allows for growable arrays in O(log(N)) instead of O(N) - one can hope to get near optimal performances in the common cases (1 and 2d) with degenerate indexing functions. Two of the references I have been looking at: - UBTree: http://www.scholarpedia.org/article/B-tree_and_UB-tree - Storing Matrices on Disk : Theory and Practice Revisited (by Zhang, Yi, Munagala, Kamesh and Yang, Jun) I have meant to implement some of those ideas and do some basic benchmarks (especially to compare against CSR and CSC in 2 dimensions, in which case one could imagine building the index function in such a way as range queries map to contiguous blocks of memory). cheers, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Mon May 14 19:33:28 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 14 May 2012 19:33:28 -0400 Subject: [Numpy-discussion] Fancy-indexing reorders output in corner cases? Message-ID: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> Hello all, The below seems to be a bug, but perhaps it's unavoidably part of the indexing mechanism? It's easiest to show via example... note that using "[0,1]" to pull two columns out of the array gives the same shape as using ":2" in the simple case, but when there's additional slicing happening, the shapes get transposed or something. In [2]: numpy.version.version # latest git version Out[2]: '1.7.0.dev-3bbbbd4' In [3]: d = numpy.empty((10, 9, 8, 7)) In [4]: d[:,:,:,[0,1]].shape Out[4]: (10, 9, 8, 2) In [5]: d[:,:,:,:2].shape Out[5]: (10, 9, 8, 2) In [6]: d[:,0,:,[0,1]].shape Out[6]: (2, 10, 8) In [7]: d[:,0,:,:2].shape Out[7]: (10, 8, 2) In [8]: d[0,:,:,[0,1]].shape Out[8]: (2, 9, 8) In [9]: d[0,:,:,:2].shape Out[9]: (9, 8, 2) Oddly, this error can appear/disappear depending on the position of the other axis sliced: In [14]: d = numpy.empty((10, 9, 8)) In [15]: d[:,:,[0,1]].shape Out[15]: (10, 9, 2) In [16]: d[:,0,[0,1]].shape Out[16]: (10, 2) In [17]: d[0,:,[0,1]].shape Out[17]: (2, 9) This cannot be the expected behavior, right? Zach From stefan at sun.ac.za Mon May 14 20:07:04 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 14 May 2012 17:07:04 -0700 Subject: [Numpy-discussion] Fancy-indexing reorders output in corner cases? In-Reply-To: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> References: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> Message-ID: Hi Zach On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus wrote: > The below seems to be a bug, but perhaps it's unavoidably part of the indexing mechanism? > > It's easiest to show via example... note that using "[0,1]" to pull two columns out of the array gives the same shape as using ":2" in the simple case, but when there's additional slicing happening, the shapes get transposed or something. When fancy indexing and slicing is mixed, the resulting shape is essentially unpredictable. The "correct" way to do it is to only use fancy indexing, i.e. generate the indices of the sliced dimension as well. St?fan From zachary.pincus at yale.edu Mon May 14 22:34:23 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 14 May 2012 22:34:23 -0400 Subject: [Numpy-discussion] Fancy-indexing reorders output in corner cases? In-Reply-To: References: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> Message-ID: <1B021924-5873-434C-89E6-90A19CBCBC39@yale.edu> > On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus wrote: >> The below seems to be a bug, but perhaps it's unavoidably part of the indexing mechanism? >> >> It's easiest to show via example... note that using "[0,1]" to pull two columns out of the array gives the same shape as using ":2" in the simple case, but when there's additional slicing happening, the shapes get transposed or something. > > When fancy indexing and slicing is mixed, the resulting shape is > essentially unpredictable. Aah, right -- this does come up on the list not infrequently, doesn't it. I'd always thought it was more exotic usages that raised these issues. Good to know. > The "correct" way to do it is to only use > fancy indexing, i.e. generate the indices of the sliced dimension as > well. > Excellent -- thanks! > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue May 15 00:03:13 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 14 May 2012 23:03:13 -0500 Subject: [Numpy-discussion] Fancy-indexing reorders output in corner cases? In-Reply-To: References: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> Message-ID: <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> On May 14, 2012, at 7:07 PM, St?fan van der Walt wrote: > Hi Zach > > On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus wrote: >> The below seems to be a bug, but perhaps it's unavoidably part of the indexing mechanism? >> >> It's easiest to show via example... note that using "[0,1]" to pull two columns out of the array gives the same shape as using ":2" in the simple case, but when there's additional slicing happening, the shapes get transposed or something. > > When fancy indexing and slicing is mixed, the resulting shape is > essentially unpredictable. The "correct" way to do it is to only use > fancy indexing, i.e. generate the indices of the sliced dimension as > well. This is not quite accurate. It is not unpredictable. It is very predictable, but a bit (too) complicated in the most general case. The problem occurs when you "intermingle" fancy indexing with slice notation (and for this purpose integer selection is considered "fancy-indexing"). While in simple cases you can think that [0,1] is equivalent to :2 --- it is not because fancy-indexing uses "zip-based ideas" instead of cross-product based ideas. The problem in general is how to make sense of something like a[:, :, in1, in2] If you keep fancy indexing to one side of the slice notation only, then you get what you expect. The shape of the output will be the first two dimensions of a + the broadcasted shape of in1 and in2 (where integers are interpreted as fancy-index arrays). So, let's say a is (10,9,8,7) and in1 is (3,4) and in2 is (4,) The shape of the output will be (10,9,3,4) filled with essentially a[:,:,i,j] = a[:,:,in1[i,j], in2[j]] What happens, though when you have a[:, in1 :, in2]? in1 and in2 are broadcasted together to create a two-dimensional "sub-space" that must fit somewhere. Where should it go? Should it replace in1 or in2? I.e. should the output be (10,3,4,8) or (10,8,3,4). To "resolve" this ambiguity, the code sends the (3,4) sub-space to the front of the "dimensions" and returns (3,4,10,8). In retro-spect, the code should raise an error as I doubt anyone actually relies on this behavior, and then we could have "done the right" thing for situations like in1 being an integer which actually makes some sense and should not have been confused with the "general case" In this particular case you might also think that we could say the result should be (10,3,8,4) but there is no guarantee that the number of dimensions that should be appended by the "fancy-indexing" objects will be the same as the number of dimensions replaced. Again, this is how fancy-indexing combines with other fancy-indexing objects. So, the behavior is actually quite predictable, it's just that in some common cases it doesn't do what you would expect --- especially if you think that [0,1] is "the same" as :2. When I wrote this code to begin with I should have raised an error and then worked in the cases that make sense. This is a good example of making the mistake of thinking that it's better to provide something very general rather than just raise an error when an obvious and clear solution is not available. There is the possibility that we could now raise an error in NumPy when this situation is encountered because I strongly doubt anyone is actually relying on the current behavior. I would like to do this, actually, as soon as possible. Comments? -Travis > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From paul.anton.letnes at gmail.com Tue May 15 01:51:01 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Tue, 15 May 2012 07:51:01 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1 In-Reply-To: References: Message-ID: On Mon, May 14, 2012 at 9:47 PM, Ralf Gommers wrote: > > > On Sun, May 13, 2012 at 1:14 PM, Paul Anton Letnes > wrote: >> >> On Sat, May 12, 2012 at 9:50 PM, Ralf Gommers >> wrote: >> > >> > >> > On Sun, May 6, 2012 at 12:12 AM, Charles R Harris >> > wrote: >> >> >> >> >> >> >> >> On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> I'm getting a couple of errors when testing. System: >> >>> Arch Linux (updated today) >> >>> Python 3.2.3 >> >>> gcc 4.7.0 >> >>> (Anything else?) >> >>> >> >>> I think that this error: >> >>> AssertionError: selectedrealkind(19): expected -1 but got 16 >> >>> is due to the fact that newer versions of gfortran actually supports >> >>> precision this high (quad precision). >> >>> >> >> >> >> Yes, but it should be fixed. I can't duplicate this here with a fresh >> >> checkout of the branch. >> > >> > >> > This failure makes no sense to me. >> > >> > Error comes from this code: >> > >> > ??? 'selectedrealkind(%s): expected %r but got %r' %? (i, >> > selected_real_kind(i), selectedrealkind(i))) >> > >> > So "selected_real_kind(19)" returns -1. >> > >> > selected_real_kind is the function >> > numpy.f2py.crackfortran._selected_real_kind_func, which is defined as: >> > >> > def _selected_real_kind_func(p, r=0, radix=0): >> > ??? #XXX: This should be processor dependent >> > ??? # This is only good for 0 <= p <= 20 >> > ??? if p < 7: return 4 >> > ??? if p < 16: return 8 >> > ??? if platform.machine().lower().startswith('power'): >> > ??????? if p <= 20: >> > ??????????? return 16 >> > ??? else: >> > ??????? if p < 19: >> > ??????????? return 10 >> > ??????? elif p <= 20: >> > ??????????? return 16 >> > ??? return -1 >> > >> > For p=19 this function should always return 16. So the result from >> > compiling >> > foo.f90 is fine, but the test is broken in a very strange way. >> > >> > Paul, is the failure reproducible on your machine? If so, can you try to >> > debug it? >> > >> > Ralf >> >> Hi Ralf. >> >> The Arch numpy (1.6.1) for python 2.7, installed via pacman (the >> package manager) has this problem. >> >> After installation of numpy 1.6.2rc1 in a virtualenv, the test passes. >> Maybe the bug was fixed in the RC, and I screwed up which numpy >> version I tested? I'm sorry that I can't find out - I just built a new >> machine, and the old one is lying around the livingroom in pieces. Was >> that particular bit of code changed between 1.6.1 and 1.6.2rc1? > > > It was actually, in https://github.com/numpy/numpy/commit/e7f2210e1. > > So you tested 1.6.1 by accident before, and it's working now? Problem solved > in that case. > > Ralf > Looks like it to me! Sorry for the silly bug report. I'll have to take more care next time... Paul From markflorisson88 at gmail.com Tue May 15 06:42:17 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 15 May 2012 11:42:17 +0100 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: <4FB16CB9.9020507@astro.uio.no> References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> <4FB16CB9.9020507@astro.uio.no> Message-ID: On 14 May 2012 21:36, Dag Sverre Seljebotn wrote: > On 05/14/2012 06:31 PM, mark florisson wrote: >> >> On 12 May 2012 22:55, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 05/11/2012 03:37 PM, mark florisson wrote: >>>> >>>> >>>> On 11 May 2012 12:13, Dag Sverre Seljebotn >>>> ?wrote: >>>>> >>>>> >>>>> (NumPy devs: I know, I get too many ideas. But this time I *really* >>>>> believe >>>>> in it, I think this is going to be *huge*. And if Mark F. likes it it's >>>>> not >>>>> going to be without manpower; and as his mentor I'd pitch in too here >>>>> and >>>>> there.) >>>>> >>>>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly >>>>> don't >>>>> want to micro-manage your GSoC, just have your take.) >>>>> >>>>> Travis, thank you very much for those good words in the "NA-mask >>>>> interactions..." thread. It put most of my concerns away. If anybody is >>>>> leaning towards for opaqueness because of its OOP purity, I want to >>>>> refer >>>>> to >>>>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >>>>> different OOP array libraries, neither of which is able to out-compete >>>>> the >>>>> other. Meanwhile the rest of the world happily cooperates using >>>>> pointers, >>>>> strides, CSR and CSC. >>>>> >>>>> Now, there are limits to what you can do with strides and pointers. >>>>> Noone's >>>>> denying the need for more. In my mind that's an API where you can do >>>>> fetch_block and put_block of cache-sized, N-dimensional blocks on an >>>>> array; >>>>> but it might be something slightly different. >>>>> >>>>> Here's what I'm asking: DO NOT simply keep extending ndarray and the >>>>> NumPy C >>>>> API to deal with this issue. >>>>> >>>>> What we need is duck-typing/polymorphism at the C level. If you keep >>>>> extending ndarray and the NumPy C API, what we'll have is a one-to-many >>>>> relationship: One provider of array technology, multiple consumers >>>>> (with >>>>> hooks, I'm sure, but all implementations of the hook concept in the >>>>> NumPy >>>>> world I've seen so far are a total disaster!). >>>>> >>>>> What I think we need instead is something like PEP 3118 for the >>>>> "abstract" >>>>> array that is only available block-wise with getters and setters. On >>>>> the >>>>> Cython list we've decided that what we want for CEP 1000 (for boxing >>>>> callbacks etc.) is to extend PyTypeObject with our own fields; we could >>>>> create CEP 1001 to solve this issue and make any Python object an >>>>> exporter >>>>> of "block-getter/setter-arrays" (better name needed). >>>>> >>>>> What would be exported is (of course) a simple vtable: >>>>> >>>>> typedef struct { >>>>> ? ?int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t >>>>> *lower_right, >>>>> ...); >>>>> ? ?... >>>>> } block_getter_setter_array_vtable; >>>>> >>>>> Let's please discuss the details *after* the fundamentals. But the >>>>> reason >>>>> I >>>>> put void* there instead of PyObject* is that I hope this could be used >>>>> beyond the Python world (say, Python<->Julia); the void* would be >>>>> handed >>>>> to >>>>> you at the time you receive the vtable (however we handle that). >>>> >>>> >>>> >>>> I suppose it would also be useful to have some way of predicting the >>>> output format polymorphically for the caller. E.g. dense * >>>> block_diagonal results in block diagonal, but dense + block_diagonal >>>> results in dense, etc. It might be useful for the caller to know >>>> whether it needs to allocate a sparse, dense or block-structured >>>> array. Or maybe the polymorphic function could even do the allocation. >>>> This needs to happen recursively of course, to avoid intermediate >>>> temporaries. The compiler could easily handle that, and so could numpy >>>> when it gets lazy evaluation. >>> >>> >>> >>> Ah. But that depends too on the computation to be performed too; a) >>> elementwise, b) axis-wise reductions, c) linear algebra... >>> >>> In my oomatrix code (please don't look at it, it's shameful) I do this >>> using >>> multiple dispatch. >>> >>> I'd rather ignore this for as long as we can, only implementing "a[:] = >>> ..." >>> -- I can't see how decisions here would trickle down to the API that's >>> used >>> in the kernel, it's more like a pre-phase, and better treated >>> orthogonally. >>> >>> >>>> I think if the heavy lifting of allocating output arrays and exporting >>>> these arrays work in numpy, then support in Cython could use that (I >>>> can already hear certain people object to more complicated array stuff >>>> in Cython :). Even better here would be an external project that each >>>> our projects could use (I still think the nditer sorting functionality >>>> of arrays should be numpy-agnostic and externally available). >>> >>> >>> >>> I agree with the separate project idea. It's trivial for NumPy to >>> incorporate that as one of its methods for exporting arrays, and I don't >>> think it makes sense to either build it into Cython, or outright depend >>> on >>> NumPy. >>> >>> Here's what I'd like (working title: NumBridge?). >>> >>> ?- Mission: Be the "double* + shape + strides" in a world where that is >>> no >>> longer enough, by providing tight, focused APIs/ABIs that are usable >>> across >>> C/Fortran/Python. >>> >>> I basically want something I can quickly acquire from a NumPy array, then >>> pass it into my C code without dragging along all the cruft that I don't >>> need. >>> >>> ?- Written in pure C + specs, usable without Python >>> >>> ?- PEP 3118 "done right", basically semi-standardize the internal Cython >>> memoryview ABI and get something that's passable on stack >>> >>> ?- Get block get/put API >>> >>> ?- Iterator APIs >>> >>> ?- Utility code for exporters and clients (iteration code, axis >>> reordering, >>> etc.) >>> >>> Is the scope of that insane, or is it at least worth a shot to see how >>> bad >>> it is? Beyond figuring out a small subset that can be done first, and >>> whether performance considerations must be taken or not, there's two >>> complicating factors: Pluggable dtypes, memory management. Perhaps you >>> could >>> come to Oslo for a couple of days to brainstorm... >>> >>> Dag >> >> >> The blocks are a good idea, but I think fairly complicated for >> complicated matrix layouts. It would be nice to have something that is >> reasonably efficient for at least most of the array storage >> mechanisms. >> I'm going to do a little brain dump below, let's see if anything is useful >> :) >> >> What if we basically take the CSR format and make it a little simpler, >> easier to handle, and better suited for other layouts. Basically, keep >> shape/strides for dense arrays, and for blocked arrays just "extend" >> your number of dimensions, i.e. a 2D blocked array becomes a 4D array, >> something like this: >> >>>>> a = np.arange(4).repeat(4).reshape(4, 4); >>>>> a >> >> array([[0, 0, 0, 0], >> ? ? ? ? ? ?[1, 1, 1, 1], >> ? ? ? ? ? ?[2, 2, 2, 2], >> ? ? ? ? ? ?[3, 3, 3, 3]]) >> >>>>> a.shape = (2, 2, 2, 2) >>>>> itemsize = a.dtype.itemsize >>>>> a.strides = (8 * itemsize, 2 * itemsize, 4 * itemsize, 1 * itemsize) >>>>> a >> >> array([[[[0, 0], >> ? ? ? ? ?[1, 1]], >> >> ? ? ? ? [[0, 0], >> ? ? ? ? ?[1, 1]]], >> >> >> ? ? ? ?[[[2, 2], >> ? ? ? ? ?[3, 3]], >> >> ? ? ? ? [[2, 2], >> ? ? ? ? ?[3, 3]]]]) >> >>>>> print a.flatten() >> >> [0 0 1 1 0 0 1 1 2 2 3 3 2 2 3 3] >> >> Now, given some buffer flag (PyBUF_Sparse or something), use basically >> suboffsets with indirect dimensions, where only ever the last >> dimension is a row of contiguous memory (the entire thing may be >> contiguous, but the point is that you don't care). This row may >> >> ? ? - be variable sized >> ? ? - have either a single "column start" (e.g. diagonal, band/tri- >> diagonal, block diagonal, etc), OR >> ? ? - a list of column indices, the length of the row (like in the CSR >> format) >> >> The length of each innermost row is then determined by looking at, in >> order: >> ? ? - the extent as specified in the shape list >> ? ? - if -1, and some flag is set, the extent is determined like CSR, >> i.e. (uintptr_t) row[i + 1] - (uintptr_t) row[i] >> ? ? ? ? -> ?maybe instead of pointers indices are better, for >> serialization, GPUs, etc >> ? ? - otherwise, use either a function pointer or perhaps a list of >> extents >> >> All these details will obviously be abstracted, allowing for easy >> iteration, but it can also be used by ufuncs operating on contiguous >> rows (often, since the actual storage is contiguous and stored in a 1D >> array, some flag could indicate contiguity as an optimization for >> unary ufuncs and flat iteration). Tiled nditer-ation could also work >> here without too big a hassle. >> When you slice, you add to the suboffset and manipulate the single >> extent or entire list of extents in that dimension, and the flag to >> determine the length using the pointer subtraction should be cleared >> (this should probably all happen through vtable functions). >> >> An exporter would also be free to use different malloced pointers, >> allowing variable sized array support with append/pop functionality in >> multiple dimensions (if there are no active buffer views). >> >> Random access in the case where a column start is provided is still >> contant time, and done though a[i][j][k - colstart], unless you have >> discontiguous rows, in which case you are allowed a logarithmic search >> (if the size exceeds some threshold). I see scipy.sparse does a linear >> search, which is pretty slow in pathological cases: >> >> from scipy import sparse >> a = np.random.random((1, 4000000)) >> b = sparse.csr_matrix(a) >> %timeit a[0, 1000] >> %timeit b[0, 1000] >> >> 10000000 loops, best of 3: 190 ns per loop >> 10 loops, best of 3: 29.3 ms per loop > > > Heh. That is *extremely* pathological though, nobody does that in real code > :-) > > Here's an idea I had yesterday: To get an ND sparse array with good spatial > locality (for local operations etc.), you could map the elements to a > volume-filling fractal and then use a hash-map with linear probing. I bet it > either doesn't work or has been done already :-) > > >> >> Now, depending on the density and operation, the caller may have some >> idea of how to allocate an output array. I'm not sure how to handle >> "insertions" of new elements, but I presume through vtable put/insert >> functions. I'm also not sure how to fit this in with linear algebra >> functionality, other than providing conversions of the view. >> >> I think a disadvantage of this scheme is that you can't reorder your >> axes anymore, and many operations that are easy in the dense case are >> suddenly harder, e.g. this scheme does not allow you to go from a >> csr-like format into csc. But I think what this gives is reasonable >> generality to allow easy use in C/Fortran/Cython compiled code/numpy >> ufunc invocation, as well as allowing efficient-ish storage for >> various kinds of arrays. >> >> Does this make any sense? > > > I'll admit I didn't go through the finer details of your idea; let's deal > with the below first and then I can re-read your post later. > > What I'm thinking is that this seems interesting, but perhaps it lowers the > versatility so much that it's not really worth to consider, for the GSoC at > least. If the impact isn't high enough, my hunch is that one may as well not > bother and just do strided arrays. > > I actually believe that the *likely* outcome of this discussion is to stick > to the original plan and focus on expressions with strided arrays. But let's > see. > > I'm not sure if what brought me to the blocked case was really block-sparse > arrays or diagonal arrays. Let's see...: > > 1) Memory conservation. The array would not be NOT element-sparse, it's just > that you don't want to hold it all in memory at once. > > Examples: > > ?- "a_array *= np.random.normal(size=a_array.shape)". The right hand side > can be generated on the fly (making it return the same data for each block > every time is non-trivial but can be done). If a takes up a good chunk of > RAM, there might not even be enough memory for the right-hand-side except > for block-by-block. (That saves bandwidth too.) > > ?- You want to stream from one file to another file, neither of which will > fit in RAM. Here we really don't care about speed, just code reuse...it's > irritating to have to manually block in such a situation. > > ?- You want to play with a new fancy array format (a blocked format that's > faster for some linear algebra, say). But then you need to call another C > library that takes data+shape+stride+dtype. Then you need to make a copy, > which you'd rather avoid -- so an alternative is to make that C library be > based on the block API so that it can interfaces with your fancy new format > (and anything else). > > 2) Bandwidth conservation. Numexpr, Theano, Numba and > Cython-vectorized-expressions all already deal with this problem on one > level. > > However, I also believe there's a higher level in many programs where blocks > come into play. The organization of many scientific codes essentially reads > "do A on 8 GB of data, then do B on 8 GB of data"; and it's going to be a > *long* time before you have full-program analysis to block up that in every > case; the tools above will be able to deal with some almost-trivial cases > only. > > A block API could be used to write "pipeline programs". For instance, > consider > > a_array[:, None] * f(x_array) > > and f is some rather complicated routine in Fortran that is NOT a ufunc -- > it takes all of x_array as input and provides all the output "at once"; but > at some point it needs to do the writing of the output, and if writing to an > API it could do the multiplication with a_array while the block is in cache > anyway and save a memory bus round-trip. (Provided the output isn't needed > as scratch space.) > > Summing up: > > Vectorized expressions on contiguous (or strided) memory in Cython is pretty > nice by itself; it would bring us up to the current state of the art of > static compilation (in Fortran compilers). > > But my hunch is your sparse-block proposal doesn't add enough flexibility to > be worth the pain. Many of the cases above can't be covered with it. > > If it's possible to identify a little nugget API that is forward-compatible > with the above usecases (it wouldn't solve them perhaps, but allow them to > be solved with some extra supporting code), then it might be worth to a) > implement it in NumPy, b) support it for Cython vectorized expressions; and > use that to support block-transposition. > > But I'm starting to feel that we can't really know how that little nugget > API should really look until the above has been explored in a little more > depth; and then we are easily talking a too large scope without tying it to > a larger project (which can't really before the GSoC..). For instance, 2) > suggests push/pull rather than put/get. > > Dag I assumed as much, in any case I was going to start with dense arrays, simply because they are in common use and well-defined at this point. Maybe what we really want is just lazy evaluation that works for different array layouts and storage mechanisms, and a smart JIT that can evaluate our linear algebra and array expressions in an optimal fashion (memory/cache-wise, but also out-of-core wise). This probably means writing everything in a high level language, because even if your Fortran routines themselves would use your lazy evaluation library, your linear algebra library wouldn't for sure :) Anyway, I'm going to be pragmatic and investigate how much of Theano we can reuse in Cython now. From markflorisson88 at gmail.com Tue May 15 06:44:55 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 15 May 2012 11:44:55 +0100 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> Message-ID: On 14 May 2012 21:54, David Cournapeau wrote: > > > On Mon, May 14, 2012 at 5:31 PM, mark florisson > wrote: >> >> On 12 May 2012 22:55, Dag Sverre Seljebotn >> wrote: >> > On 05/11/2012 03:37 PM, mark florisson wrote: >> >> >> >> On 11 May 2012 12:13, Dag Sverre Seljebotn >> >> ?wrote: >> >>> >> >>> (NumPy devs: I know, I get too many ideas. But this time I *really* >> >>> believe >> >>> in it, I think this is going to be *huge*. And if Mark F. likes it >> >>> it's >> >>> not >> >>> going to be without manpower; and as his mentor I'd pitch in too here >> >>> and >> >>> there.) >> >>> >> >>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly >> >>> don't >> >>> want to micro-manage your GSoC, just have your take.) >> >>> >> >>> Travis, thank you very much for those good words in the "NA-mask >> >>> interactions..." thread. It put most of my concerns away. If anybody >> >>> is >> >>> leaning towards for opaqueness because of its OOP purity, I want to >> >>> refer >> >>> to >> >>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >> >>> different OOP array libraries, neither of which is able to out-compete >> >>> the >> >>> other. Meanwhile the rest of the world happily cooperates using >> >>> pointers, >> >>> strides, CSR and CSC. >> >>> >> >>> Now, there are limits to what you can do with strides and pointers. >> >>> Noone's >> >>> denying the need for more. In my mind that's an API where you can do >> >>> fetch_block and put_block of cache-sized, N-dimensional blocks on an >> >>> array; >> >>> but it might be something slightly different. >> >>> >> >>> Here's what I'm asking: DO NOT simply keep extending ndarray and the >> >>> NumPy C >> >>> API to deal with this issue. >> >>> >> >>> What we need is duck-typing/polymorphism at the C level. If you keep >> >>> extending ndarray and the NumPy C API, what we'll have is a >> >>> one-to-many >> >>> relationship: One provider of array technology, multiple consumers >> >>> (with >> >>> hooks, I'm sure, but all implementations of the hook concept in the >> >>> NumPy >> >>> world I've seen so far are a total disaster!). >> >>> >> >>> What I think we need instead is something like PEP 3118 for the >> >>> "abstract" >> >>> array that is only available block-wise with getters and setters. On >> >>> the >> >>> Cython list we've decided that what we want for CEP 1000 (for boxing >> >>> callbacks etc.) is to extend PyTypeObject with our own fields; we >> >>> could >> >>> create CEP 1001 to solve this issue and make any Python object an >> >>> exporter >> >>> of "block-getter/setter-arrays" (better name needed). >> >>> >> >>> What would be exported is (of course) a simple vtable: >> >>> >> >>> typedef struct { >> >>> ? ?int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t >> >>> *lower_right, >> >>> ...); >> >>> ? ?... >> >>> } block_getter_setter_array_vtable; >> >>> >> >>> Let's please discuss the details *after* the fundamentals. But the >> >>> reason >> >>> I >> >>> put void* there instead of PyObject* is that I hope this could be used >> >>> beyond the Python world (say, Python<->Julia); the void* would be >> >>> handed >> >>> to >> >>> you at the time you receive the vtable (however we handle that). >> >> >> >> >> >> I suppose it would also be useful to have some way of predicting the >> >> output format polymorphically for the caller. E.g. dense * >> >> block_diagonal results in block diagonal, but dense + block_diagonal >> >> results in dense, etc. It might be useful for the caller to know >> >> whether it needs to allocate a sparse, dense or block-structured >> >> array. Or maybe the polymorphic function could even do the allocation. >> >> This needs to happen recursively of course, to avoid intermediate >> >> temporaries. The compiler could easily handle that, and so could numpy >> >> when it gets lazy evaluation. >> > >> > >> > Ah. But that depends too on the computation to be performed too; a) >> > elementwise, b) axis-wise reductions, c) linear algebra... >> > >> > In my oomatrix code (please don't look at it, it's shameful) I do this >> > using >> > multiple dispatch. >> > >> > I'd rather ignore this for as long as we can, only implementing "a[:] = >> > ..." >> > -- I can't see how decisions here would trickle down to the API that's >> > used >> > in the kernel, it's more like a pre-phase, and better treated >> > orthogonally. >> > >> > >> >> I think if the heavy lifting of allocating output arrays and exporting >> >> these arrays work in numpy, then support in Cython could use that (I >> >> can already hear certain people object to more complicated array stuff >> >> in Cython :). Even better here would be an external project that each >> >> our projects could use (I still think the nditer sorting functionality >> >> of arrays should be numpy-agnostic and externally available). >> > >> > >> > I agree with the separate project idea. It's trivial for NumPy to >> > incorporate that as one of its methods for exporting arrays, and I don't >> > think it makes sense to either build it into Cython, or outright depend >> > on >> > NumPy. >> > >> > Here's what I'd like (working title: NumBridge?). >> > >> > ?- Mission: Be the "double* + shape + strides" in a world where that is >> > no >> > longer enough, by providing tight, focused APIs/ABIs that are usable >> > across >> > C/Fortran/Python. >> > >> > I basically want something I can quickly acquire from a NumPy array, >> > then >> > pass it into my C code without dragging along all the cruft that I don't >> > need. >> > >> > ?- Written in pure C + specs, usable without Python >> > >> > ?- PEP 3118 "done right", basically semi-standardize the internal Cython >> > memoryview ABI and get something that's passable on stack >> > >> > ?- Get block get/put API >> > >> > ?- Iterator APIs >> > >> > ?- Utility code for exporters and clients (iteration code, axis >> > reordering, >> > etc.) >> > >> > Is the scope of that insane, or is it at least worth a shot to see how >> > bad >> > it is? Beyond figuring out a small subset that can be done first, and >> > whether performance considerations must be taken or not, there's two >> > complicating factors: Pluggable dtypes, memory management. Perhaps you >> > could >> > come to Oslo for a couple of days to brainstorm... >> > >> > Dag >> >> The blocks are a good idea, but I think fairly complicated for >> complicated matrix layouts. It would be nice to have something that is >> reasonably efficient for at least most of the array storage >> mechanisms. >> I'm going to do a little brain dump below, let's see if anything is useful >> :) >> >> What if we basically take the CSR format and make it a little simpler, >> easier to handle, and better suited for other layouts. Basically, keep >> shape/strides for dense arrays, and for blocked arrays just "extend" >> your number of dimensions, i.e. a 2D blocked array becomes a 4D array, >> something like this: >> >> >>> a = np.arange(4).repeat(4).reshape(4, 4); >> >>> a >> array([[0, 0, 0, 0], >> ? ? ? ? ? [1, 1, 1, 1], >> ? ? ? ? ? [2, 2, 2, 2], >> ? ? ? ? ? [3, 3, 3, 3]]) >> >> >>> a.shape = (2, 2, 2, 2) >> >>> itemsize = a.dtype.itemsize >> >>> a.strides = (8 * itemsize, 2 * itemsize, 4 * itemsize, 1 * itemsize) >> >>> a >> array([[[[0, 0], >> ? ? ? ? [1, 1]], >> >> ? ? ? ?[[0, 0], >> ? ? ? ? [1, 1]]], >> >> >> ? ? ? [[[2, 2], >> ? ? ? ? [3, 3]], >> >> ? ? ? ?[[2, 2], >> ? ? ? ? [3, 3]]]]) >> >> >>> print a.flatten() >> [0 0 1 1 0 0 1 1 2 2 3 3 2 2 3 3] >> >> Now, given some buffer flag (PyBUF_Sparse or something), use basically >> suboffsets with indirect dimensions, where only ever the last >> dimension is a row of contiguous memory (the entire thing may be >> contiguous, but the point is that you don't care). This row may >> >> ? ?- be variable sized >> ? ?- have either a single "column start" (e.g. diagonal, band/tri- >> diagonal, block diagonal, etc), OR >> ? ?- a list of column indices, the length of the row (like in the CSR >> format) >> >> The length of each innermost row is then determined by looking at, in >> order: >> ? ?- the extent as specified in the shape list >> ? ?- if -1, and some flag is set, the extent is determined like CSR, >> i.e. (uintptr_t) row[i + 1] - (uintptr_t) row[i] >> ? ? ? ?-> maybe instead of pointers indices are better, for >> serialization, GPUs, etc >> ? ?- otherwise, use either a function pointer or perhaps a list of extents >> >> All these details will obviously be abstracted, allowing for easy >> iteration, but it can also be used by ufuncs operating on contiguous >> rows (often, since the actual storage is contiguous and stored in a 1D >> array, some flag could indicate contiguity as an optimization for >> unary ufuncs and flat iteration). Tiled nditer-ation could also work >> here without too big a hassle. >> When you slice, you add to the suboffset and manipulate the single >> extent or entire list of extents in that dimension, and the flag to >> determine the length using the pointer subtraction should be cleared >> (this should probably all happen through vtable functions). >> >> An exporter would also be free to use different malloced pointers, >> allowing variable sized array support with append/pop functionality in >> multiple dimensions (if there are no active buffer views). >> >> Random access in the case where a column start is provided is still >> contant time, and done though a[i][j][k - colstart], unless you have >> discontiguous rows, in which case you are allowed a logarithmic search >> (if the size exceeds some threshold). I see scipy.sparse does a linear >> search, which is pretty slow in pathological cases: >> >> from scipy import sparse >> a = np.random.random((1, 4000000)) >> b = sparse.csr_matrix(a) >> %timeit a[0, 1000] >> %timeit b[0, 1000] >> >> 10000000 loops, best of 3: 190 ns per loop >> 10 loops, best of 3: 29.3 ms per loop >> >> Now, depending on the density and operation,?the caller may have some >> idea of how to allocate an output array. I'm not sure how to handle >> "insertions" of new elements, but I presume through vtable put/insert >> functions. I'm also not sure how to fit this in with linear algebra >> functionality, other than providing conversions of the view. >> >> I think a disadvantage of this scheme is that you can't reorder your >> axes anymore, and many operations that are easy in the dense case are >> suddenly harder, e.g. this scheme does not allow you to go from a >> csr-like format into csc. But I think what this gives is reasonable >> generality to allow easy use in C/Fortran/Cython compiled code/numpy >> ufunc invocation, as well as allowing efficient-ish storage for >> various kinds of arrays. >> >> Does this make any sense? > > > I think it does, although it is not clear to me how it would generalize to > more than 2 dimensions (i.e. how would you handle a 3d sparse array). Would > you add more level of indirection ? Yes exactly, it extends to an arbitrary number of dimensions. The column start or column indices in my example were only in the last dimension, but it pertains to each respective dimension. > I have been thinking a bit about related concepts in the context of > generalized sparse arrays (which could be a first step toward numpy arrays > on top of multiple malloc blocks instead of just one), and one of the > solution I have seen is generalized b-tree/b*-trees. The main appeal to me > is that the underlying storage is still one-dimensional, and the "magic" > happens "only" in the indexing. This has several advantages: > > ? - getting the low level block algorithms rights is a solved problem > ? - it allows for growable arrays in O(log(N)) instead of O(N) > ? - one can hope to get near optimal performances in the common cases (1 and > 2d) with degenerate indexing functions. > > Two of the references I have been looking at: > ? - UBTree:?http://www.scholarpedia.org/article/B-tree_and_UB-tree > ? -?Storing Matrices on Disk : Theory and Practice Revisited (by?Zhang, > Yi,?Munagala, Kamesh and?Yang, Jun) > > I have meant to implement some of those ideas and do some basic benchmarks > (especially to compare against CSR and CSC in 2 dimensions, in which case > one could imagine building the index function in such a way as range queries > map to contiguous blocks of memory). Interesting, thanks for the links. > cheers, > > David > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Tue May 15 07:20:02 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 15 May 2012 13:20:02 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> <4FB16CB9.9020507@astro.uio.no> Message-ID: <4FB23BE2.1000103@astro.uio.no> On 05/15/2012 12:42 PM, mark florisson wrote: > On 14 May 2012 21:36, Dag Sverre Seljebotn wrote: >> On 05/14/2012 06:31 PM, mark florisson wrote: >>> >>> On 12 May 2012 22:55, Dag Sverre Seljebotn >>> wrote: >>>> >>>> On 05/11/2012 03:37 PM, mark florisson wrote: >>>>> >>>>> >>>>> On 11 May 2012 12:13, Dag Sverre Seljebotn >>>>> wrote: >>>>>> >>>>>> >>>>>> (NumPy devs: I know, I get too many ideas. But this time I *really* >>>>>> believe >>>>>> in it, I think this is going to be *huge*. And if Mark F. likes it it's >>>>>> not >>>>>> going to be without manpower; and as his mentor I'd pitch in too here >>>>>> and >>>>>> there.) >>>>>> >>>>>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly >>>>>> don't >>>>>> want to micro-manage your GSoC, just have your take.) >>>>>> >>>>>> Travis, thank you very much for those good words in the "NA-mask >>>>>> interactions..." thread. It put most of my concerns away. If anybody is >>>>>> leaning towards for opaqueness because of its OOP purity, I want to >>>>>> refer >>>>>> to >>>>>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >>>>>> different OOP array libraries, neither of which is able to out-compete >>>>>> the >>>>>> other. Meanwhile the rest of the world happily cooperates using >>>>>> pointers, >>>>>> strides, CSR and CSC. >>>>>> >>>>>> Now, there are limits to what you can do with strides and pointers. >>>>>> Noone's >>>>>> denying the need for more. In my mind that's an API where you can do >>>>>> fetch_block and put_block of cache-sized, N-dimensional blocks on an >>>>>> array; >>>>>> but it might be something slightly different. >>>>>> >>>>>> Here's what I'm asking: DO NOT simply keep extending ndarray and the >>>>>> NumPy C >>>>>> API to deal with this issue. >>>>>> >>>>>> What we need is duck-typing/polymorphism at the C level. If you keep >>>>>> extending ndarray and the NumPy C API, what we'll have is a one-to-many >>>>>> relationship: One provider of array technology, multiple consumers >>>>>> (with >>>>>> hooks, I'm sure, but all implementations of the hook concept in the >>>>>> NumPy >>>>>> world I've seen so far are a total disaster!). >>>>>> >>>>>> What I think we need instead is something like PEP 3118 for the >>>>>> "abstract" >>>>>> array that is only available block-wise with getters and setters. On >>>>>> the >>>>>> Cython list we've decided that what we want for CEP 1000 (for boxing >>>>>> callbacks etc.) is to extend PyTypeObject with our own fields; we could >>>>>> create CEP 1001 to solve this issue and make any Python object an >>>>>> exporter >>>>>> of "block-getter/setter-arrays" (better name needed). >>>>>> >>>>>> What would be exported is (of course) a simple vtable: >>>>>> >>>>>> typedef struct { >>>>>> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t >>>>>> *lower_right, >>>>>> ...); >>>>>> ... >>>>>> } block_getter_setter_array_vtable; >>>>>> >>>>>> Let's please discuss the details *after* the fundamentals. But the >>>>>> reason >>>>>> I >>>>>> put void* there instead of PyObject* is that I hope this could be used >>>>>> beyond the Python world (say, Python<->Julia); the void* would be >>>>>> handed >>>>>> to >>>>>> you at the time you receive the vtable (however we handle that). >>>>> >>>>> >>>>> >>>>> I suppose it would also be useful to have some way of predicting the >>>>> output format polymorphically for the caller. E.g. dense * >>>>> block_diagonal results in block diagonal, but dense + block_diagonal >>>>> results in dense, etc. It might be useful for the caller to know >>>>> whether it needs to allocate a sparse, dense or block-structured >>>>> array. Or maybe the polymorphic function could even do the allocation. >>>>> This needs to happen recursively of course, to avoid intermediate >>>>> temporaries. The compiler could easily handle that, and so could numpy >>>>> when it gets lazy evaluation. >>>> >>>> >>>> >>>> Ah. But that depends too on the computation to be performed too; a) >>>> elementwise, b) axis-wise reductions, c) linear algebra... >>>> >>>> In my oomatrix code (please don't look at it, it's shameful) I do this >>>> using >>>> multiple dispatch. >>>> >>>> I'd rather ignore this for as long as we can, only implementing "a[:] = >>>> ..." >>>> -- I can't see how decisions here would trickle down to the API that's >>>> used >>>> in the kernel, it's more like a pre-phase, and better treated >>>> orthogonally. >>>> >>>> >>>>> I think if the heavy lifting of allocating output arrays and exporting >>>>> these arrays work in numpy, then support in Cython could use that (I >>>>> can already hear certain people object to more complicated array stuff >>>>> in Cython :). Even better here would be an external project that each >>>>> our projects could use (I still think the nditer sorting functionality >>>>> of arrays should be numpy-agnostic and externally available). >>>> >>>> >>>> >>>> I agree with the separate project idea. It's trivial for NumPy to >>>> incorporate that as one of its methods for exporting arrays, and I don't >>>> think it makes sense to either build it into Cython, or outright depend >>>> on >>>> NumPy. >>>> >>>> Here's what I'd like (working title: NumBridge?). >>>> >>>> - Mission: Be the "double* + shape + strides" in a world where that is >>>> no >>>> longer enough, by providing tight, focused APIs/ABIs that are usable >>>> across >>>> C/Fortran/Python. >>>> >>>> I basically want something I can quickly acquire from a NumPy array, then >>>> pass it into my C code without dragging along all the cruft that I don't >>>> need. >>>> >>>> - Written in pure C + specs, usable without Python >>>> >>>> - PEP 3118 "done right", basically semi-standardize the internal Cython >>>> memoryview ABI and get something that's passable on stack >>>> >>>> - Get block get/put API >>>> >>>> - Iterator APIs >>>> >>>> - Utility code for exporters and clients (iteration code, axis >>>> reordering, >>>> etc.) >>>> >>>> Is the scope of that insane, or is it at least worth a shot to see how >>>> bad >>>> it is? Beyond figuring out a small subset that can be done first, and >>>> whether performance considerations must be taken or not, there's two >>>> complicating factors: Pluggable dtypes, memory management. Perhaps you >>>> could >>>> come to Oslo for a couple of days to brainstorm... >>>> >>>> Dag >>> >>> >>> The blocks are a good idea, but I think fairly complicated for >>> complicated matrix layouts. It would be nice to have something that is >>> reasonably efficient for at least most of the array storage >>> mechanisms. >>> I'm going to do a little brain dump below, let's see if anything is useful >>> :) >>> >>> What if we basically take the CSR format and make it a little simpler, >>> easier to handle, and better suited for other layouts. Basically, keep >>> shape/strides for dense arrays, and for blocked arrays just "extend" >>> your number of dimensions, i.e. a 2D blocked array becomes a 4D array, >>> something like this: >>> >>>>>> a = np.arange(4).repeat(4).reshape(4, 4); >>>>>> a >>> >>> array([[0, 0, 0, 0], >>> [1, 1, 1, 1], >>> [2, 2, 2, 2], >>> [3, 3, 3, 3]]) >>> >>>>>> a.shape = (2, 2, 2, 2) >>>>>> itemsize = a.dtype.itemsize >>>>>> a.strides = (8 * itemsize, 2 * itemsize, 4 * itemsize, 1 * itemsize) >>>>>> a >>> >>> array([[[[0, 0], >>> [1, 1]], >>> >>> [[0, 0], >>> [1, 1]]], >>> >>> >>> [[[2, 2], >>> [3, 3]], >>> >>> [[2, 2], >>> [3, 3]]]]) >>> >>>>>> print a.flatten() >>> >>> [0 0 1 1 0 0 1 1 2 2 3 3 2 2 3 3] >>> >>> Now, given some buffer flag (PyBUF_Sparse or something), use basically >>> suboffsets with indirect dimensions, where only ever the last >>> dimension is a row of contiguous memory (the entire thing may be >>> contiguous, but the point is that you don't care). This row may >>> >>> - be variable sized >>> - have either a single "column start" (e.g. diagonal, band/tri- >>> diagonal, block diagonal, etc), OR >>> - a list of column indices, the length of the row (like in the CSR >>> format) >>> >>> The length of each innermost row is then determined by looking at, in >>> order: >>> - the extent as specified in the shape list >>> - if -1, and some flag is set, the extent is determined like CSR, >>> i.e. (uintptr_t) row[i + 1] - (uintptr_t) row[i] >>> -> maybe instead of pointers indices are better, for >>> serialization, GPUs, etc >>> - otherwise, use either a function pointer or perhaps a list of >>> extents >>> >>> All these details will obviously be abstracted, allowing for easy >>> iteration, but it can also be used by ufuncs operating on contiguous >>> rows (often, since the actual storage is contiguous and stored in a 1D >>> array, some flag could indicate contiguity as an optimization for >>> unary ufuncs and flat iteration). Tiled nditer-ation could also work >>> here without too big a hassle. >>> When you slice, you add to the suboffset and manipulate the single >>> extent or entire list of extents in that dimension, and the flag to >>> determine the length using the pointer subtraction should be cleared >>> (this should probably all happen through vtable functions). >>> >>> An exporter would also be free to use different malloced pointers, >>> allowing variable sized array support with append/pop functionality in >>> multiple dimensions (if there are no active buffer views). >>> >>> Random access in the case where a column start is provided is still >>> contant time, and done though a[i][j][k - colstart], unless you have >>> discontiguous rows, in which case you are allowed a logarithmic search >>> (if the size exceeds some threshold). I see scipy.sparse does a linear >>> search, which is pretty slow in pathological cases: >>> >>> from scipy import sparse >>> a = np.random.random((1, 4000000)) >>> b = sparse.csr_matrix(a) >>> %timeit a[0, 1000] >>> %timeit b[0, 1000] >>> >>> 10000000 loops, best of 3: 190 ns per loop >>> 10 loops, best of 3: 29.3 ms per loop >> >> >> Heh. That is *extremely* pathological though, nobody does that in real code >> :-) >> >> Here's an idea I had yesterday: To get an ND sparse array with good spatial >> locality (for local operations etc.), you could map the elements to a >> volume-filling fractal and then use a hash-map with linear probing. I bet it >> either doesn't work or has been done already :-) >> >> >>> >>> Now, depending on the density and operation, the caller may have some >>> idea of how to allocate an output array. I'm not sure how to handle >>> "insertions" of new elements, but I presume through vtable put/insert >>> functions. I'm also not sure how to fit this in with linear algebra >>> functionality, other than providing conversions of the view. >>> >>> I think a disadvantage of this scheme is that you can't reorder your >>> axes anymore, and many operations that are easy in the dense case are >>> suddenly harder, e.g. this scheme does not allow you to go from a >>> csr-like format into csc. But I think what this gives is reasonable >>> generality to allow easy use in C/Fortran/Cython compiled code/numpy >>> ufunc invocation, as well as allowing efficient-ish storage for >>> various kinds of arrays. >>> >>> Does this make any sense? >> >> >> I'll admit I didn't go through the finer details of your idea; let's deal >> with the below first and then I can re-read your post later. >> >> What I'm thinking is that this seems interesting, but perhaps it lowers the >> versatility so much that it's not really worth to consider, for the GSoC at >> least. If the impact isn't high enough, my hunch is that one may as well not >> bother and just do strided arrays. >> >> I actually believe that the *likely* outcome of this discussion is to stick >> to the original plan and focus on expressions with strided arrays. But let's >> see. >> >> I'm not sure if what brought me to the blocked case was really block-sparse >> arrays or diagonal arrays. Let's see...: >> >> 1) Memory conservation. The array would not be NOT element-sparse, it's just >> that you don't want to hold it all in memory at once. >> >> Examples: >> >> - "a_array *= np.random.normal(size=a_array.shape)". The right hand side >> can be generated on the fly (making it return the same data for each block >> every time is non-trivial but can be done). If a takes up a good chunk of >> RAM, there might not even be enough memory for the right-hand-side except >> for block-by-block. (That saves bandwidth too.) >> >> - You want to stream from one file to another file, neither of which will >> fit in RAM. Here we really don't care about speed, just code reuse...it's >> irritating to have to manually block in such a situation. >> >> - You want to play with a new fancy array format (a blocked format that's >> faster for some linear algebra, say). But then you need to call another C >> library that takes data+shape+stride+dtype. Then you need to make a copy, >> which you'd rather avoid -- so an alternative is to make that C library be >> based on the block API so that it can interfaces with your fancy new format >> (and anything else). >> >> 2) Bandwidth conservation. Numexpr, Theano, Numba and >> Cython-vectorized-expressions all already deal with this problem on one >> level. >> >> However, I also believe there's a higher level in many programs where blocks >> come into play. The organization of many scientific codes essentially reads >> "do A on 8 GB of data, then do B on 8 GB of data"; and it's going to be a >> *long* time before you have full-program analysis to block up that in every >> case; the tools above will be able to deal with some almost-trivial cases >> only. >> >> A block API could be used to write "pipeline programs". For instance, >> consider >> >> a_array[:, None] * f(x_array) >> >> and f is some rather complicated routine in Fortran that is NOT a ufunc -- >> it takes all of x_array as input and provides all the output "at once"; but >> at some point it needs to do the writing of the output, and if writing to an >> API it could do the multiplication with a_array while the block is in cache >> anyway and save a memory bus round-trip. (Provided the output isn't needed >> as scratch space.) >> >> Summing up: >> >> Vectorized expressions on contiguous (or strided) memory in Cython is pretty >> nice by itself; it would bring us up to the current state of the art of >> static compilation (in Fortran compilers). >> >> But my hunch is your sparse-block proposal doesn't add enough flexibility to >> be worth the pain. Many of the cases above can't be covered with it. >> >> If it's possible to identify a little nugget API that is forward-compatible >> with the above usecases (it wouldn't solve them perhaps, but allow them to >> be solved with some extra supporting code), then it might be worth to a) >> implement it in NumPy, b) support it for Cython vectorized expressions; and >> use that to support block-transposition. >> >> But I'm starting to feel that we can't really know how that little nugget >> API should really look until the above has been explored in a little more >> depth; and then we are easily talking a too large scope without tying it to >> a larger project (which can't really before the GSoC..). For instance, 2) >> suggests push/pull rather than put/get. >> >> Dag > > I assumed as much, in any case I was going to start with dense arrays, > simply because they are in common use and well-defined at this point. > Maybe what we really want is just lazy evaluation that works for > different array layouts and storage mechanisms, and a smart JIT that > can evaluate our linear algebra and array expressions in an optimal > fashion (memory/cache-wise, but also out-of-core wise). This probably > means writing everything in a high level language, because even if > your Fortran routines themselves would use your lazy evaluation > library, your linear algebra library wouldn't for sure :) Anyway, I'm My feeling is there should definitely be some very sweet spot somewhere that's not as good as everything-lazily-evaluated/full program analysis, but still much better than where we are today. Of course, that doesn't mean that you should bother about it :-) > going to be pragmatic and investigate how much of Theano we can reuse > in Cython now. Sounds good. At least now the larger world of blocking is in the back of our minds as things proceed. Dag From shish at keba.be Tue May 15 09:03:26 2012 From: shish at keba.be (Olivier Delalleau) Date: Tue, 15 May 2012 09:03:26 -0400 Subject: [Numpy-discussion] Fancy-indexing reorders output in corner cases? In-Reply-To: <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> References: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> Message-ID: 2012/5/15 Travis Oliphant > > On May 14, 2012, at 7:07 PM, St?fan van der Walt wrote: > > > Hi Zach > > > > On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus > wrote: > >> The below seems to be a bug, but perhaps it's unavoidably part of the > indexing mechanism? > >> > >> It's easiest to show via example... note that using "[0,1]" to pull two > columns out of the array gives the same shape as using ":2" in the simple > case, but when there's additional slicing happening, the shapes get > transposed or something. > > > > When fancy indexing and slicing is mixed, the resulting shape is > > essentially unpredictable. The "correct" way to do it is to only use > > fancy indexing, i.e. generate the indices of the sliced dimension as > > well. > > This is not quite accurate. It is not unpredictable. It is very > predictable, but a bit (too) complicated in the most general case. The > problem occurs when you "intermingle" fancy indexing with slice notation > (and for this purpose integer selection is considered "fancy-indexing"). > While in simple cases you can think that [0,1] is equivalent to :2 --- it > is not because fancy-indexing uses "zip-based ideas" instead of > cross-product based ideas. > > The problem in general is how to make sense of something like > > a[:, :, in1, in2] > > If you keep fancy indexing to one side of the slice notation only, then > you get what you expect. The shape of the output will be the first two > dimensions of a + the broadcasted shape of in1 and in2 (where integers are > interpreted as fancy-index arrays). > > So, let's say a is (10,9,8,7) and in1 is (3,4) and in2 is (4,) > > The shape of the output will be (10,9,3,4) filled with essentially > a[:,:,i,j] = a[:,:,in1[i,j], in2[j]] > > What happens, though when you have > > a[:, in1 :, in2]? > > in1 and in2 are broadcasted together to create a two-dimensional > "sub-space" that must fit somewhere. Where should it go? Should it > replace in1 or in2? I.e. should the output be > > (10,3,4,8) or (10,8,3,4). > > To "resolve" this ambiguity, the code sends the (3,4) sub-space to the > front of the "dimensions" and returns (3,4,10,8). In retro-spect, the > code should raise an error as I doubt anyone actually relies on this > behavior, and then we could have "done the right" thing for situations like > in1 being an integer which actually makes some sense and should not have > been confused with the "general case" > > In this particular case you might also think that we could say the result > should be (10,3,8,4) but there is no guarantee that the number of > dimensions that should be appended by the "fancy-indexing" objects will be > the same as the number of dimensions replaced. Again, this is how > fancy-indexing combines with other fancy-indexing objects. > > So, the behavior is actually quite predictable, it's just that in some > common cases it doesn't do what you would expect --- especially if you > think that [0,1] is "the same" as :2. When I wrote this code to begin > with I should have raised an error and then worked in the cases that make > sense. This is a good example of making the mistake of thinking that > it's better to provide something very general rather than just raise an > error when an obvious and clear solution is not available. > > There is the possibility that we could now raise an error in NumPy when > this situation is encountered because I strongly doubt anyone is actually > relying on the current behavior. I would like to do this, actually, as > soon as possible. Comments? > +1 to raise an error instead of an unintuitive behavior. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Tue May 15 09:49:05 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 15 May 2012 09:49:05 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: Hi, In fact, I would arg to never change the current behavior, but add the flag for people that want to use it. Why? 1) There is probably >10k script that use it that will need to be checked for correctness. There won't be easy to see crash or error that allow user to see it. 2) This is a globally not significant speed up by this change. Due to 1), i think it is not work it. Why this is not a significant speed up? First, the user already create and use the original tensor. Suppose a matrix of size n x n. If it don't fit in the cache, creating it will cost n * n. But coping it will cost cst * n. The cst is the price of loading a full cache line. But if you return a view, you will pay this cst price later when you do the computation. But it all case, this is cheap compared to the cost of creating the matrix. Also, you will do work on the matrix and this work will be much more costly then the price of the copy. In the case the matrix fix in the cache, the price of the copy is even lower. So in conclusion, optimizing the diagonal won't give speed up in the global user script, but will break many of them. I'm sure there is corner case where speed up of diag will be significant, but I don't think they happen in real code. And if they happen, asking them to add a keyword is better then breaking make script I my opinion. Fred On Sun, May 13, 2012 at 4:11 AM, Nathaniel Smith wrote: > On Sun, May 13, 2012 at 3:28 AM, Travis Oliphant wrote: >> Another approach would be to introduce a method: >> >> a.diag(copy=False) >> >> and leave a.diagonal() alone. ?Then, a.diagonal() could be deprecated over >> 2-3 releases. > > This would be a good idea if we didn't already have both > np.diagonal(a) (which is an alias for a.diagonal()) *and* np.diag(a), > which do different things. And the new a.diag() would be different > from the existing np.diag(a)... > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From nouiz at nouiz.org Tue May 15 11:42:30 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 15 May 2012 11:42:30 -0400 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> <4FB16CB9.9020507@astro.uio.no> Message-ID: Hi, On Tue, May 15, 2012 at 6:42 AM, mark florisson wrote: > I assumed as much, in any case I was going to start with dense arrays, > simply because they are in common use and well-defined at this point. > Maybe what we really want is just lazy evaluation that works for > different array layouts and storage mechanisms, and a smart JIT that > can evaluate our linear algebra and array expressions in an optimal > fashion (memory/cache-wise, but also out-of-core wise). This probably > means writing everything in a high level language, because even if > your Fortran routines themselves would use your lazy evaluation > library, your linear algebra library wouldn't for sure :) Anyway, I'm > going to be pragmatic and investigate how much of Theano we can reuse > in Cython now. Don't hesitate to contact me or the mailing lists for questions/comments/anything. We can also arrange skype meeting if this help. Fred From d.s.seljebotn at astro.uio.no Tue May 15 13:52:34 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 15 May 2012 19:52:34 +0200 Subject: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer In-Reply-To: References: <4FACF458.6070200@astro.uio.no> <4FAEDC58.8050502@astro.uio.no> Message-ID: <4FB297E2.8060903@astro.uio.no> On 05/13/2012 12:27 AM, Charles R Harris wrote: > > > On Sat, May 12, 2012 at 3:55 PM, Dag Sverre Seljebotn > > wrote: > > On 05/11/2012 03:37 PM, mark florisson wrote: > > On 11 May 2012 12:13, Dag Sverre > Seljebotn > wrote: > >> (NumPy devs: I know, I get too many ideas. But this time I > *really* believe > >> in it, I think this is going to be *huge*. And if Mark F. likes > it it's not > >> going to be without manpower; and as his mentor I'd pitch in too > here and > >> there.) > >> > >> (Mark F.: I believe this is *very* relevant to your GSoC. I > certainly don't > >> want to micro-manage your GSoC, just have your take.) > >> > >> Travis, thank you very much for those good words in the "NA-mask > >> interactions..." thread. It put most of my concerns away. If > anybody is > >> leaning towards for opaqueness because of its OOP purity, I want > to refer to > >> C++ and its walled-garden of ideological purity -- it has, what, 3-4 > >> different OOP array libraries, neither of which is able to > out-compete the > >> other. Meanwhile the rest of the world happily cooperates using > pointers, > >> strides, CSR and CSC. > >> > >> Now, there are limits to what you can do with strides and > pointers. Noone's > >> denying the need for more. In my mind that's an API where you can do > >> fetch_block and put_block of cache-sized, N-dimensional blocks > on an array; > >> but it might be something slightly different. > >> > >> Here's what I'm asking: DO NOT simply keep extending ndarray and > the NumPy C > >> API to deal with this issue. > >> > >> What we need is duck-typing/polymorphism at the C level. If you keep > >> extending ndarray and the NumPy C API, what we'll have is a > one-to-many > >> relationship: One provider of array technology, multiple > consumers (with > >> hooks, I'm sure, but all implementations of the hook concept in > the NumPy > >> world I've seen so far are a total disaster!). > >> > >> What I think we need instead is something like PEP 3118 for the > "abstract" > >> array that is only available block-wise with getters and > setters. On the > >> Cython list we've decided that what we want for CEP 1000 (for boxing > >> callbacks etc.) is to extend PyTypeObject with our own fields; > we could > >> create CEP 1001 to solve this issue and make any Python object > an exporter > >> of "block-getter/setter-arrays" (better name needed). > >> > >> What would be exported is (of course) a simple vtable: > >> > >> typedef struct { > >> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t > *lower_right, > >> ...); > >> ... > >> } block_getter_setter_array_vtable; > >> > >> Let's please discuss the details *after* the fundamentals. But > the reason I > >> put void* there instead of PyObject* is that I hope this could > be used > >> beyond the Python world (say, Python<->Julia); the void* would > be handed to > >> you at the time you receive the vtable (however we handle that). > > > > I suppose it would also be useful to have some way of predicting the > > output format polymorphically for the caller. E.g. dense * > > block_diagonal results in block diagonal, but dense + block_diagonal > > results in dense, etc. It might be useful for the caller to know > > whether it needs to allocate a sparse, dense or block-structured > > array. Or maybe the polymorphic function could even do the > allocation. > > This needs to happen recursively of course, to avoid intermediate > > temporaries. The compiler could easily handle that, and so could > numpy > > when it gets lazy evaluation. > > Ah. But that depends too on the computation to be performed too; a) > elementwise, b) axis-wise reductions, c) linear algebra... > > In my oomatrix code (please don't look at it, it's shameful) I do this > using multiple dispatch. > > I'd rather ignore this for as long as we can, only implementing "a[:] = > ..." -- I can't see how decisions here would trickle down to the API > that's used in the kernel, it's more like a pre-phase, and better > treated orthogonally. > > > I think if the heavy lifting of allocating output arrays and > exporting > > these arrays work in numpy, then support in Cython could use that (I > > can already hear certain people object to more complicated array > stuff > > in Cython :). Even better here would be an external project that each > > our projects could use (I still think the nditer sorting > functionality > > of arrays should be numpy-agnostic and externally available). > > I agree with the separate project idea. It's trivial for NumPy to > incorporate that as one of its methods for exporting arrays, and I don't > think it makes sense to either build it into Cython, or outright depend > on NumPy. > > Here's what I'd like (working title: NumBridge?). > > - Mission: Be the "double* + shape + strides" in a world where that is > no longer enough, by providing tight, focused APIs/ABIs that are usable > across C/Fortran/Python. > > I basically want something I can quickly acquire from a NumPy array, > then pass it into my C code without dragging along all the cruft that I > don't need. > > - Written in pure C + specs, usable without Python > > - PEP 3118 "done right", basically semi-standardize the internal > Cython memoryview ABI and get something that's passable on stack > > - Get block get/put API > > - Iterator APIs > > - Utility code for exporters and clients (iteration code, axis > reordering, etc.) > > Is the scope of that insane, or is it at least worth a shot to see how > bad it is? Beyond figuring out a small subset that can be done first, > and whether performance considerations must be taken or not, there's two > complicating factors: Pluggable dtypes, memory management. Perhaps you > could come to Oslo for a couple of days to brainstorm... > > > There have been musings on this list along those lines with the idea > that numpy/ufuncs would be built on top of that base, so it isn't crazy > ;) Perhaps it is time to take a more serious look at it. Especially if > there is help to get it implemented and made available through tools > such as Cython. As this matured, I agree with what seems to be Mark Florisson's feeling that this is a bit too daunting for a GSoC tack-on. For now, my part will be to try to push forward with CEP 1001 on python-dev etc. over the summer, so that new features in NumPy that are not covered by PEP 3118 can be eventually exported by new slots in PyTypeObject rather than requiring consumers to use PyObject_TypeCheck and the NumPy C API. That should make it much easier to develop and play with new features by having more classes, rather than having to stick everything in ndarray. Of course, there can still be a NumPy C API; it would just tend to change "PyArrayObject*" to "PyObject*" in new NumPy C API functions, and simply have a macro that forwards to a polymorphic dispatch (it's already a function pointer in the calling module; so this would just make it a function pointer on the type object instead). Does the NumPy devs agree with this vision? (I'll plunge ahead anyway because we'll need it internally in Cython.) Dag From stefan at sun.ac.za Tue May 15 14:08:05 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 15 May 2012 11:08:05 -0700 Subject: [Numpy-discussion] Fancy-indexing reorders output in corner cases? In-Reply-To: <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> References: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> Message-ID: On Mon, May 14, 2012 at 9:03 PM, Travis Oliphant wrote: > What happens, though when you have > > a[:, in1 :, in2]? > [...] > > To "resolve" this ambiguity, the code sends the (3,4) sub-space to the front of the "dimensions" and returns (3,4,10,8). ? In retro-spect, the code should raise an error as I doubt anyone actually relies on this behavior, and then we could have "done the right" thing for situations like in1 being an integer which actually makes some sense and should not have been confused with the "general case" You're right, the behavior is deterministic but counter-intuitive, and at least warning in the case when dimensions get flipped around would be helpful. Raising an error may break some code, though, so for that we'll have to bump the API. St?fan From efiring at hawaii.edu Tue May 15 14:15:38 2012 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 15 May 2012 08:15:38 -1000 Subject: [Numpy-discussion] Fancy-indexing reorders output in corner cases? In-Reply-To: <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> References: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> Message-ID: <4FB29D4A.4050605@hawaii.edu> On 05/14/2012 06:03 PM, Travis Oliphant wrote: > What happens, though when you have > > a[:, in1 :, in2]? > > in1 and in2 are broadcasted together to create a two-dimensional > "sub-space" that must fit somewhere. Where should it go? Should > it replace in1 or in2? I.e. should the output be > > (10,3,4,8) or (10,8,3,4). > > To "resolve" this ambiguity, the code sends the (3,4) sub-space to > the front of the "dimensions" and returns (3,4,10,8). In > retro-spect, the code should raise an error as I doubt anyone > actually relies on this behavior, and then we could have "done the > right" thing for situations like in1 being an integer which actually > makes some sense and should not have been confused with the "general > case" > > In this particular case you might also think that we could say the > result should be (10,3,8,4) but there is no guarantee that the number > of dimensions that should be appended by the "fancy-indexing" objects > will be the same as the number of dimensions replaced. Again, this > is how fancy-indexing combines with other fancy-indexing objects. > > So, the behavior is actually quite predictable, it's just that in > some common cases it doesn't do what you would expect --- especially > if you think that [0,1] is "the same" as :2. When I wrote this code > to begin with I should have raised an error and then worked in the > cases that make sense. This is a good example of making the > mistake of thinking that it's better to provide something very > general rather than just raise an error when an obvious and clear > solution is not available. > > There is the possibility that we could now raise an error in NumPy > when this situation is encountered because I strongly doubt anyone is > actually relying on the current behavior. I would like to do this, > actually, as soon as possible. Comments? Travis, Good idea, especially if you can then make the integer case work as one might reasonably expect. Keeping the present too-fancy capabilities can only cause continuing confusion. Eric > > -Travis From ralf.gommers at googlemail.com Tue May 15 14:52:07 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 15 May 2012 20:52:07 +0200 Subject: [Numpy-discussion] Debian/Ubuntu patch help (was: ANN: NumPy 1.6.2 release candidate 1) Message-ID: On Sat, May 12, 2012 at 9:17 PM, Ralf Gommers wrote: > > > On Sat, May 12, 2012 at 6:22 PM, Sandro Tosi wrote: > >> Hello, >> >> On Sat, May 5, 2012 at 8:15 PM, Ralf Gommers >> wrote: >> > Hi, >> > >> > I'm pleased to announce the availability of the first release candidate >> of >> > NumPy 1.6.2. This is a maintenance release. Due to the delay of the >> NumPy >> > 1.7.0, this release contains far more fixes than a regular NumPy bugfix >> > release. It also includes a number of documentation and build >> improvements. >> > >> > Sources and binary installers can be found at >> > https://sourceforge.net/projects/numpy/files/NumPy/1.6.2rc1/ >> > >> > Please test this release and report any issues on the numpy-discussion >> > mailing list. >> ... >> > BLD: add support for the new X11 directory structure on Ubuntu & co. >> >> We've just discovered that this fix is not enough. Actually the new >> directories are due to the "multi-arch" feature of Debian systems, >> that allows to install libraries from other (foreign) architectures >> than the one the machine is (the classic example, i386 libraries on a >> amd64 host). >> >> the fix included to look up in additional directories is currently >> only for X11, while for example Debian has fftw3 that's >> multi-arch-ified and thus will fail to be detected. >> >> Could this fix be extended to include all other things that are >> checked? for reference the bug in Debian is [1]; there was also a >> patch[2] in previous versions, that was using gcc to get the >> multi-arch paths - you might use as a reference, or to implement >> something debian-systems-specific. >> >> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=640940 >> [2] >> http://anonscm.debian.org/viewvc/python-modules/packages/numpy/trunk/debian/patches/50_search-multiarch-paths.patch?view=markup&pathrev=21168 >> >> It would be awesome is such support would end up in 1.6.2 . >> > > Hardcoding some more paths to check in distutils/system_info.py should be > OK, also for 1.6.2 (will require a new RC). > > The --print-multiarch thing looks very questionable. As far as I can tell, > it's a Debian specific gcc patch, only available in gcc 4.6 and up. Ubuntu > before 11.10 release also doesn't have it. Therefore I don't think use of > --print-multiarch is appropriate for numpy for now, and certainly not a > change I'd like to make to distutils right before a release. > > If anyone with access to a Debian/Ubuntu system could come up with a patch > which adds the right paths to system_info.py, that would be great. > Hi, if there's anyone wants to have a look at the above issue this week, that would be great. If there's a patch by this weekend I can create a second RC, so we can still have the final release before the end of this month (needed for Debian freeze). Otherwise a second RC won't be needed. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue May 15 16:35:18 2012 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 15 May 2012 22:35:18 +0200 Subject: [Numpy-discussion] Debian/Ubuntu patch help (was: ANN: NumPy 1.6.2 release candidate 1) Message-ID: <4FB2BE06.3030204@googlemail.com> > Hi, if there's anyone wants to have a look at the above issue this > week, >that would be great. > If there's a patch by this weekend I can create a second RC, so we can > still have the final release before the end of this month (needed for > Debian freeze). Otherwise a second RC won't be needed. bugfixes are still allowed during the debian freeze, so that should not be an issue for the release timing. I don't see the issue with the gcc --print-multiarch patch besides maybe some cleanup. --print-multiarch is a debian specific gcc patch, but multiarch is debian specific for now. It doesn't work in 11.04, but who cares, that will be end of life in 5 month anyway. Besides x11 almost nothing is multiarched in 11.04 anyway and that can still be covered by the currently existing method. gcc should be available for pretty much anything requiring numpy.distutils anyway so that should be not be an issue. On systems without --print-multiarch or gcc you just ignore the failing, there will be no repercussions as there will also not be any multiarched libraries. the only potential issue I see is that upstream gcc adds a --print-multiarch that does something completely different and harmful for distutils, but I don't consider that very likely. Hardcoding a bunch of paths defeats the whole purpose of multiarch. It will just to break in future (e.g. when the x32 abi comes to debian/ubuntu) and will make cross compiling harder (though I guess numpy distutils may not be built for that anyway) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From pyprog05 at gmail.com Tue May 15 18:10:54 2012 From: pyprog05 at gmail.com (Angelo Lama) Date: Tue, 15 May 2012 22:10:54 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Invitation_=C3=A0_se_connecter_sur_L?= =?utf-8?q?inkedIn?= Message-ID: <267886103.16692845.1337119854280.JavaMail.app@ela4-app0130.prod> LinkedIn ------------ J'aimerais vous inviter ? rejoindre mon r?seau professionnel en ligne, sur le site LinkedIn. Angelo Angelo Lama Enseignant chez Education Nationale R?gion du Havre , France Veuillez confirmer que vous connaissez Angelo Lama?: https://www.linkedin.com/e/y0t7lq-h29imymd-22/isd/7111244668/I3JugrOP/?hs=false&tok=035Tvjzk8i35g1 -- Vous recevez des invitations ? vous connecter par e-mail. Cliquez ici si vous ne souhaitez plus recevoir ces e-mails?: http://www.linkedin.com/e/y0t7lq-h29imymd-22/QUQsB3LtSZ2GEKV2lSDM9GQaSZzHf14aD5FkspoJ/goo/numpy-discussion%40scipy%2Eorg/20061/I2427834873_1/?hs=false&tok=1xN2t7hVQi35g1 (c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed May 16 06:22:51 2012 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 16 May 2012 12:22:51 +0200 Subject: [Numpy-discussion] masked and missing values, a historical perspective and some comments In-Reply-To: <5EBECDE6AE404B4CB4586B0ECDFAD78D@localhost> References: <5EBECDE6AE404B4CB4586B0ECDFAD78D@localhost> Message-ID: All, I?ve been fairly quiet on the various missing/masked values proposals, sorry about that. I figured I would take the opportunity of Travis O.?s deadline of Wed. 16th to put some aspects of numpy.ma in a semi-historical, semi-personal perspective. Apologies in advance if I misrepresent some aspects, I count on our archivists to set the record straight when needed. Let it be also known that I haven't had any time to play with the new NA implementation... Once upon a time... ------------------- numpy.ma was developed on top of Paul Dubois? implementation for Numeric and its adaptation to Numarray. Originally a masked array was its own object, the combination of a regular ndarray (the `.data`) and its `.mask` counterpart, a boolean ndarray with the same shape as the `.data`. Two special values were also defined, `nomask`, a flag in practice equal to False indicating that no item was masked in the `.data` array and `masked`, representing an item being missing or ignored in the `.data`. The `.mask` had either `True` items (if the corresponding item in the `.data` was missing or ignored, ie was `masked`) or `False` items otherwise. Nothing surprising here. I started to use numpy to analyze hydroclimatic datasets that are often incomplete: for some reasons or another, data wasn?t recorded for some period of time, and I wanted to take this aspect into account. Maskedarrays-as-objects met my needs until I tried to create a subclass? After unsuccessful attempts, doubtlessly due to my inexperience, I came to the realization that most of the issues I was experiencing could perhaps be solved by making the MaskedArray class a subclass of ndarray. The new implementation followed the original one as much as possible, for compatibility of course but also because I felt that there must have been some pretty good reasons for why things were coded the way they were, if I couldn?t exactly see why at the time. For example, the `nomask` flag, the `fill value` concept, the shared mask approach were kept, as well as the MaskedUnary/MaskedBinary/DomainBinaryOperations layers. On that regard, actually, there have been some modifications. Initially, the `.mask` of the output was computed first (from the `.mask` of the input(s) and a validity domain, if any), and the non-masked part filled accordingly. This approach worked well in most cases but turned out to be problematic in some domained operations: finding the domain of np.pow in particular required too many tests. Therefore, the most efficient (well, least inefficient) approach was to compute first the output `.data` and then its mask from the input masks and the invalid results. An ugly hack, yes, but which was deemed necessary with a pure Python implementation. Paul Dubois had confided during an email exchange that if things had been different, he would have coded MaskedArrays in C. I would have done the same thing if I could speak C (I'm still slowly learning it). I even considered Cython for a while, but never managed to subclass a ndarray (things may have changed now, it's been a while since I last tried). I think that most of the current issues with numpy.ma come from the fact that it's pure Python. The API itself hasn't really changed since the very beginning of Numeric, even if it has been improved (e.g. to support structured type) and the original masked-array-as-an-object concept has been replaced by a subclass of ndarray. IMHO, most of the overhead would be compensated by a lower-level implementation. In any case, numpy.ma is currently in pure Python and therefore suboptimal. For this reason, I always considered it mostly as a convenience module for simple problems; for more elaborate ones, I advised to manipulate the `.data` and `.mask` (actually, the `._data` and `._mask`...) independently, at least till we come with a more efficient solution. I'm not saying that the current concepts behind numpy.ma are perfect, by far. There's no distinction between np.NA and np.IGNORED, for example, and numpy.ma automatically mask invalid data (like INF or NAN), which may be numpy.ma unmaintained ? ----------------------- Chuck claimed [ http://article.gmane.org/gmane.comp.python.numeric.general/49815] that numpy.ma is unmaintained. While I must recognized I haven't been able to work on the bugs for the past 18 months, I'm hoping that my soon-to-be future employer will leave me a bit of time to work on open-source projects (anyway, if this summer's weather is like this spring's, I won't be spending a lot of time in a park). Nevertheless, the tickets I've seen so far don't look that scary and could be fixed rather easily by anybody. The ticket about the printing of masked arrays with a given precision is trickier, though (as the current implementation relies on a trick to avoid having to redo everything). MaskedArray by default? Yesss! ------------------------------ So far, I'm 100% with Chuck when he recently [ http://article.gmane.org/gmane.comp.python.numeric.general/49489] suggested to transform all ndarrays in masked arrays. That was something I had half-jokingly suggested at the time, because it would have made things so much easier for me. Of course, most users never have to deal with missing or ignored values, and forcing a ndarray to carry an extra mask is counter-productive. Nevertheless, it shouldn't be as a problem, as Chuck pointed out. numpy.ma has already the `nomask` flag that is used to check whether there are no missing/ignored data in the array (a nifty trick Paul Dubois introduced to save some time) >>> if marray.mask is nomask: >>> (no missing values: do as if you have a basic ndarray) >>> else: >>> (some missing values: proceed as needed) We could use the same scheme in C, with a flag indicating whether a mask is defined. If the mask is not set, we could use the standard ufuncs. If it is, we would use ufuncs modified to take a mask into account, or raise some exception if they are not yet defined or could not be defined (e.g., FFT). Introducing the flag would no doubt break some things downstream, but it still looks like the easiest. np.NA vs np.IGNORED / bitpattern vs mask ---------------------------------------- I think everybody recognizes that np.NA and np.IGNORED are different yet complementary concepts. A recent post[ http://article.gmane.org/gmane.comp.python.numeric.general/49347] illustrated a use of missing data for concentrations below a given limit. Now imagine a series of concentrations recorded over a period of time where (i) the captor failed to work (no point) (ii) the captor worked properly but only traces were recorded (a point below limit). The first case (i) would be np.NA, the second np.IGNORED. Should you perform some time series statistics, you may want to be able to distinguish between the two. I hope we all agree that np.NA is destructive, np.IGNORED is not. I would expect to be able to assign values as such: a[...] = np.NA <=> a[...].data=np.NA a[...] = np.IGNORED <=> a[...].mask=True (numpy.ma convention) That means that I could still 'ignore' a 'missing' value, by setting the corresponding item in the `.mask` to the appropriate value (True w/ the numpy.ma convention, False in Mark's...). Finding where the np.NA are in an array could be done through a np.isna function (analogous to np.isnan). What about the ignored values? Should we just take the .mask? Do we need a np.isignored ? Computationally, both np.NA and np.IGNORED could be lumped in a single mask. We would then have to decide how to use this combined mask, which brings the subject of propagation. I disagree w/ Lluis' proposal [ http://article.gmane.org/gmane.comp.python.numeric.general/46728] to consider destructiveness and propagation as orthogonal properties. I agree with the following, though: np.NA : propagating np.IGNORED : non-propagating. If a user wants to have a non-destructive but propagating flag, it'd be up to her (give me an example where it could be useful). On a value basis, I agree with operation(np.NA, x) -> np.NA [eg, np.NA + x = np.NA] operation(np.IGNORED, x) -> np.IGNORED [eg, np.IGNORED + x = np.IGNORED] And now ? --------- I think a first step would be to agree on what the behavior should be, which would define a set of unit tests. Could we consider a wiki page somewhere that would include polls, so that each of us could vote and/or discuss specifics ? Then, of course, there's the matter of implementation. I won't have a lot of time to propose actual code for the next few weeks (couple of months), so I'm not expecting my voice to weigh more than others? I'd be glad to participate in civil discussions or to answer emails off-list on numpy.ma related aspects and hopefully I'll be more active by the end of the summer, beginning of the fall. Cordially, P. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed May 16 09:55:00 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 May 2012 14:55:00 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Tue, May 15, 2012 at 2:49 PM, Fr?d?ric Bastien wrote: > Hi, > > In fact, I would arg to never change the current behavior, but add the > flag for people that want to use it. > > Why? > > 1) There is probably >10k script that use it that will need to be > checked for correctness. There won't be easy to see crash or error > that allow user to see it. My suggestion is that we follow the scheme, which I think gives ample opportunity for people to notice problems: 1.7: works like 1.6, except that a DeprecationWarning is produced if (and only if) someone writes to an array returned by np.diagonal (or friends). This gives a pleasant heads-up for those who pay attention to DeprecationWarnings. 1.8: return a view, but mark this view read-only. This causes crashes for anyone who ignored the DeprecationWarnings, guaranteeing that they'll notice the issue. 1.9: return a writeable view, transition complete. I've written a pull request implementing the first part of this; I hope everyone interested will take a look: https://github.com/numpy/numpy/pull/280 > 2) This is a globally not significant speed up by this change. Due to > 1), i think it is not work it. Why this is not a significant speed up? > First, the user already create and use the original tensor. Suppose a > matrix of size n x n. If it don't fit in the cache, creating it will > cost n * n. But coping it will cost cst * n. The cst is the price of > loading a full cache line. But if you return a view, you will pay this > cst price later when you do the computation. But it all case, this is > cheap compared to the cost of creating the matrix. Also, you will do > work on the matrix and this work will be much more costly then the > price of the copy. > > In the case the matrix fix in the cache, the price of the copy is even lower. > > So in conclusion, optimizing the diagonal won't give speed up in the > global user script, but will break many of them. I agree that the speed difference is small. I'm more worried about the cost to users of having to remember odd inconsistencies like this, and to think about whether there actually is a speed difference or not, etc. (If we do add a copy=False option, then I guarantee many people will use it religiously "just in case" the speed difference is enough to matter! And that would suck for them.) Returning a view makes the API slightly nicer, cleaner, more consistent, more useful. (I believe the reason this was implemented in the first place was that providing a convenient way to *write* to the diagonal of an arbitrary array made it easier to implement numpy.eye for masked arrays.) And the whole point of numpy is to trade off a little speed in favor of having a simple, easy-to-work with high-level API :-). -- Nathaniel From njs at pobox.com Wed May 16 10:03:43 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 May 2012 15:03:43 +0100 Subject: [Numpy-discussion] Fancy-indexing reorders output in corner cases? In-Reply-To: <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> References: <75ACB2C0-8292-41D3-A652-421ADE3E4401@yale.edu> <9F7881C1-AFA8-49F1-BEC4-7727764949F6@continuum.io> Message-ID: On Tue, May 15, 2012 at 5:03 AM, Travis Oliphant wrote: > So, the behavior is actually quite predictable, it's just that in some common cases it doesn't do what you would expect --- especially if you think that [0,1] is "the same" as :2. ? When I wrote this code to begin with I should have raised an error and then worked in the cases that make sense. ? ?This is a good example of making the mistake of thinking that it's better to provide something very general rather than just raise an error when an obvious and clear solution is not available. > > There is the possibility that we could now raise an error in NumPy when this situation is encountered because I strongly doubt anyone is actually relying on the current behavior. ? ?I would like to do this, actually, as soon as possible. ?Comments? +1 from me. It'd probably be a good idea to check for and deprecate any similarly bizarre cases of mixed boolean indexing/slicing, mixed boolean/integer indexing, etc. - N From ben.root at ou.edu Wed May 16 10:04:50 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 16 May 2012 10:04:50 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 9:55 AM, Nathaniel Smith wrote: > On Tue, May 15, 2012 at 2:49 PM, Fr?d?ric Bastien wrote: > > Hi, > > > > In fact, I would arg to never change the current behavior, but add the > > flag for people that want to use it. > > > > Why? > > > > 1) There is probably >10k script that use it that will need to be > > checked for correctness. There won't be easy to see crash or error > > that allow user to see it. > > My suggestion is that we follow the scheme, which I think gives ample > opportunity for people to notice problems: > > 1.7: works like 1.6, except that a DeprecationWarning is produced if > (and only if) someone writes to an array returned by np.diagonal (or > friends). This gives a pleasant heads-up for those who pay attention > to DeprecationWarnings. > > 1.8: return a view, but mark this view read-only. This causes crashes > for anyone who ignored the DeprecationWarnings, guaranteeing that > they'll notice the issue. > > 1.9: return a writeable view, transition complete. > > I've written a pull request implementing the first part of this; I > hope everyone interested will take a look: > https://github.com/numpy/numpy/pull/280 > > > 2) This is a globally not significant speed up by this change. Due to > > 1), i think it is not work it. Why this is not a significant speed up? > > First, the user already create and use the original tensor. Suppose a > > matrix of size n x n. If it don't fit in the cache, creating it will > > cost n * n. But coping it will cost cst * n. The cst is the price of > > loading a full cache line. But if you return a view, you will pay this > > cst price later when you do the computation. But it all case, this is > > cheap compared to the cost of creating the matrix. Also, you will do > > work on the matrix and this work will be much more costly then the > > price of the copy. > > > > In the case the matrix fix in the cache, the price of the copy is even > lower. > > > > So in conclusion, optimizing the diagonal won't give speed up in the > > global user script, but will break many of them. > > I agree that the speed difference is small. I'm more worried about the > cost to users of having to remember odd inconsistencies like this, and > to think about whether there actually is a speed difference or not, > etc. (If we do add a copy=False option, then I guarantee many people > will use it religiously "just in case" the speed difference is enough > to matter! And that would suck for them.) > > Returning a view makes the API slightly nicer, cleaner, more > consistent, more useful. (I believe the reason this was implemented in > the first place was that providing a convenient way to *write* to the > diagonal of an arbitrary array made it easier to implement numpy.eye > for masked arrays.) And the whole point of numpy is to trade off a > little speed in favor of having a simple, easy-to-work with high-level > API :-). > > -- Nathaniel > Just as a sanity check, do the scipy tests run without producing any such messages? Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed May 16 10:10:03 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 May 2012 15:10:03 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 3:04 PM, Benjamin Root wrote: > > > On Wed, May 16, 2012 at 9:55 AM, Nathaniel Smith wrote: >> >> On Tue, May 15, 2012 at 2:49 PM, Fr?d?ric Bastien wrote: >> > Hi, >> > >> > In fact, I would arg to never change the current behavior, but add the >> > flag for people that want to use it. >> > >> > Why? >> > >> > 1) There is probably >10k script that use it that will need to be >> > checked for correctness. There won't be easy to see crash or error >> > that allow user to see it. >> >> My suggestion is that we follow the scheme, which I think gives ample >> opportunity for people to notice problems: >> >> 1.7: works like 1.6, except that a DeprecationWarning is produced if >> (and only if) someone writes to an array returned by np.diagonal (or >> friends). This gives a pleasant heads-up for those who pay attention >> to DeprecationWarnings. >> >> 1.8: return a view, but mark this view read-only. This causes crashes >> for anyone who ignored the DeprecationWarnings, guaranteeing that >> they'll notice the issue. >> >> 1.9: return a writeable view, transition complete. >> >> I've written a pull request implementing the first part of this; I >> hope everyone interested will take a look: >> ?https://github.com/numpy/numpy/pull/280 >> >> > 2) This is a globally not significant speed up by this change. Due to >> > 1), i think it is not work it. Why this is not a significant speed up? >> > First, the user already create and use the original tensor. Suppose a >> > matrix of size n x n. If it don't fit in the cache, creating it will >> > cost n * n. But coping it will cost cst * n. The cst is the price of >> > loading a full cache line. But if you return a view, you will pay this >> > cst price later when you do the computation. But it all case, this is >> > cheap compared to the cost of creating the matrix. Also, you will do >> > work on the matrix and this work will be much more costly then the >> > price of the copy. >> > >> > In the case the matrix fix in the cache, the price of the copy is even >> > lower. >> > >> > So in conclusion, optimizing the diagonal won't give speed up in the >> > global user script, but will break many of them. >> >> I agree that the speed difference is small. I'm more worried about the >> cost to users of having to remember odd inconsistencies like this, and >> to think about whether there actually is a speed difference or not, >> etc. (If we do add a copy=False option, then I guarantee many people >> will use it religiously "just in case" the speed difference is enough >> to matter! And that would suck for them.) >> >> Returning a view makes the API slightly nicer, cleaner, more >> consistent, more useful. (I believe the reason this was implemented in >> the first place was that providing a convenient way to *write* to the >> diagonal of an arbitrary array made it easier to implement numpy.eye >> for masked arrays.) And the whole point of numpy is to trade off a >> little speed in favor of having a simple, easy-to-work with high-level >> API :-). >> >> -- Nathaniel > > > Just as a sanity check, do the scipy tests run without producing any such > messages? I tried checking this before, actually, but can't figure out how to build scipy against a copy of numpy that is installed in either a virtualenv or just on PYTHONPATH. (Basically, I just don't want to install some random development numpy into my system python.) Any suggestions? -- Nathaniel From robert.kern at gmail.com Wed May 16 10:15:06 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 16 May 2012 15:15:06 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 3:10 PM, Nathaniel Smith wrote: > On Wed, May 16, 2012 at 3:04 PM, Benjamin Root wrote: >> Just as a sanity check, do the scipy tests run without producing any such >> messages? > > I tried checking this before, actually, but can't figure out how to > build scipy against a copy of numpy that is installed in either a > virtualenv or just on PYTHONPATH. (Basically, I just don't want to > install some random development numpy into my system python.) Any > suggestions? scipy will build against whatever numpy the python executable that runs the setup.py manages to import. So if you are using virtualenv, just make sure that the virtualenv is activated and "python" refers to the virtualenv's python executable. -- Robert Kern From silva at lma.cnrs-mrs.fr Wed May 16 11:05:48 2012 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Wed, 16 May 2012 17:05:48 +0200 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage Message-ID: <1337180748.2747.7.camel@amilo.coursju> Hi, I am getting into troubles when using numpy.testing with coverage. A minimal example package is atached to this email. Unpack and run: $ python -c "import mypackage; mypackage.test(verbose=10,coverage=False)" $ python -c "import mypackage; mypackage.test(verbose=10,coverage=True)" Some explanations: This package contains two module files, one of them (b.py) defining a class (mycls), the other one (a.py) importing the first module. I then add a simple test file, that does instantiate the class through the other module (a.py) and checking the type of the resulting object against b.mycls Without coverage, everything is ok. With coverage, I got a (unexpected ?) reload of the modules, leading to a mismatch of types... The real case is somewhat more complicated, and I would prefer to keep the instantiation through a.py. Is there a way to solve the problem ? Best regards -- Fabrice Silva LMA UPR CNRS 7051 -------------- next part -------------- A non-text attachment was scrubbed... Name: mypackage.tar Type: application/x-tar Size: 10240 bytes Desc: not available URL: From njs at pobox.com Wed May 16 11:21:21 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 May 2012 16:21:21 +0100 Subject: [Numpy-discussion] Scipy build can't find BLAS when using numpy master? (Was: Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)) Message-ID: On Wed, May 16, 2012 at 3:15 PM, Robert Kern wrote: > On Wed, May 16, 2012 at 3:10 PM, Nathaniel Smith wrote: >> On Wed, May 16, 2012 at 3:04 PM, Benjamin Root wrote: > >>> Just as a sanity check, do the scipy tests run without producing any such >>> messages? >> >> I tried checking this before, actually, but can't figure out how to >> build scipy against a copy of numpy that is installed in either a >> virtualenv or just on PYTHONPATH. (Basically, I just don't want to >> install some random development numpy into my system python.) Any >> suggestions? > > scipy will build against whatever numpy the python executable that > runs the setup.py manages to import. So if you are using virtualenv, > just make sure that the virtualenv is activated and "python" refers to > the virtualenv's python executable. Yes, tried that :-/. I assumed the error was because of some vritualenv wonkiness or something, though (IIRC virtualenv didn't use to be supported?). On further investigation, it appears that scipy 0.10.1 (from the git tag) just flat out doesn't build against current numpy master, at least on my box :-(. I built some pristine python 2.7 installs from scratch (no virtualenv, no distro tweaks, etc.). Then I installed some version of numpy in each, then tried building scipy. With numpy 1.6.1 (built from git), everything seems fine - it finds atlas in /usr/lib64/atlas. With current numpy master (3bbbbd416a0a), numpy itself builds fine, but scipy's setup.py can't find atlas (even though it's looking in the right places). Log attached. This is on 64-bit Ubuntu 11.04 "natty". Thoughts? -- Nathaniel -------------- next part -------------- A non-text attachment was scrubbed... Name: build-error.log Type: application/octet-stream Size: 4300 bytes Desc: not available URL: From robert.kern at gmail.com Wed May 16 11:24:51 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 16 May 2012 16:24:51 +0100 Subject: [Numpy-discussion] Scipy build can't find BLAS when using numpy master? (Was: Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 4:21 PM, Nathaniel Smith wrote: > I built some pristine python 2.7 installs from scratch (no virtualenv, > no distro tweaks, etc.). Then I installed some version of numpy in > each, then tried building scipy. With numpy 1.6.1 (built from git), > everything seems fine - it finds atlas in /usr/lib64/atlas. > > With current numpy master (3bbbbd416a0a), numpy itself builds fine, > but scipy's setup.py can't find atlas (even though it's looking in the > right places). Log attached. The log says it's looking in /usr/lib64/atlas-base/, not /usr/lib64/atlas/. -- Robert Kern From njs at pobox.com Wed May 16 11:35:07 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 May 2012 16:35:07 +0100 Subject: [Numpy-discussion] Scipy build can't find BLAS when using numpy master? (Was: Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 4:24 PM, Robert Kern wrote: > On Wed, May 16, 2012 at 4:21 PM, Nathaniel Smith wrote: > >> I built some pristine python 2.7 installs from scratch (no virtualenv, >> no distro tweaks, etc.). Then I installed some version of numpy in >> each, then tried building scipy. With numpy 1.6.1 (built from git), >> everything seems fine - it finds atlas in /usr/lib64/atlas. >> >> With current numpy master (3bbbbd416a0a), numpy itself builds fine, >> but scipy's setup.py can't find atlas (even though it's looking in the >> right places). Log attached. > > The log says it's looking in /usr/lib64/atlas-base/, not /usr/lib64/atlas/. Sorry, that was an error in my email. The libraries are actually in /usr/lib64/atlas-base, and when I build against numpy 1.6.1, the build log says: Setting PTATLAS=ATLAS FOUND: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/usr/lib64/atlas-base'] language = c define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] include_dirs = ['/usr/include/atlas'] FOUND: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/usr/lib64/atlas-base'] language = c define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] include_dirs = ['/usr/include/atlas'] -- Nathaniel From robert.kern at gmail.com Wed May 16 11:50:08 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 16 May 2012 16:50:08 +0100 Subject: [Numpy-discussion] Scipy build can't find BLAS when using numpy master? (Was: Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 4:35 PM, Nathaniel Smith wrote: > On Wed, May 16, 2012 at 4:24 PM, Robert Kern wrote: >> On Wed, May 16, 2012 at 4:21 PM, Nathaniel Smith wrote: >> >>> I built some pristine python 2.7 installs from scratch (no virtualenv, >>> no distro tweaks, etc.). Then I installed some version of numpy in >>> each, then tried building scipy. With numpy 1.6.1 (built from git), >>> everything seems fine - it finds atlas in /usr/lib64/atlas. >>> >>> With current numpy master (3bbbbd416a0a), numpy itself builds fine, >>> but scipy's setup.py can't find atlas (even though it's looking in the >>> right places). Log attached. >> >> The log says it's looking in /usr/lib64/atlas-base/, not /usr/lib64/atlas/. > > Sorry, that was an error in my email. The libraries are actually in > /usr/lib64/atlas-base, and when I build against numpy 1.6.1, the build > log says: > > Setting PTATLAS=ATLAS > ?FOUND: > ? ?libraries = ['ptf77blas', 'ptcblas', 'atlas'] > ? ?library_dirs = ['/usr/lib64/atlas-base'] > ? ?language = c > ? ?define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] > ? ?include_dirs = ['/usr/include/atlas'] > > ?FOUND: > ? ?libraries = ['ptf77blas', 'ptcblas', 'atlas'] > ? ?library_dirs = ['/usr/lib64/atlas-base'] > ? ?language = c > ? ?define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] > ? ?include_dirs = ['/usr/include/atlas'] Hmm. I would throw in some print statements into the "_check_libs()" and "_lib_list()" methods in numpy/distutils/system_info.py and see what it's checking for. -- Robert Kern From jsseabold at gmail.com Wed May 16 12:08:52 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 16 May 2012 12:08:52 -0400 Subject: [Numpy-discussion] Scipy build can't find BLAS when using numpy master? (Was: Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 11:50 AM, Robert Kern wrote: > On Wed, May 16, 2012 at 4:35 PM, Nathaniel Smith wrote: >> On Wed, May 16, 2012 at 4:24 PM, Robert Kern wrote: >>> On Wed, May 16, 2012 at 4:21 PM, Nathaniel Smith wrote: >>> >>>> I built some pristine python 2.7 installs from scratch (no virtualenv, >>>> no distro tweaks, etc.). Then I installed some version of numpy in >>>> each, then tried building scipy. With numpy 1.6.1 (built from git), >>>> everything seems fine - it finds atlas in /usr/lib64/atlas. >>>> >>>> With current numpy master (3bbbbd416a0a), numpy itself builds fine, >>>> but scipy's setup.py can't find atlas (even though it's looking in the >>>> right places). Log attached. >>> >>> The log says it's looking in /usr/lib64/atlas-base/, not /usr/lib64/atlas/. >> >> Sorry, that was an error in my email. The libraries are actually in >> /usr/lib64/atlas-base, and when I build against numpy 1.6.1, the build >> log says: >> >> Setting PTATLAS=ATLAS >> ?FOUND: >> ? ?libraries = ['ptf77blas', 'ptcblas', 'atlas'] >> ? ?library_dirs = ['/usr/lib64/atlas-base'] >> ? ?language = c >> ? ?define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] >> ? ?include_dirs = ['/usr/include/atlas'] >> >> ?FOUND: >> ? ?libraries = ['ptf77blas', 'ptcblas', 'atlas'] >> ? ?library_dirs = ['/usr/lib64/atlas-base'] >> ? ?language = c >> ? ?define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] >> ? ?include_dirs = ['/usr/include/atlas'] > > Hmm. I would throw in some print statements into the "_check_libs()" > and "_lib_list()" methods in numpy/distutils/system_info.py and see > what it's checking for. Possibly related? http://thread.gmane.org/gmane.comp.python.numeric.general/46145 Skipper From josef.pktd at gmail.com Wed May 16 12:44:41 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 16 May 2012 12:44:41 -0400 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage In-Reply-To: <1337180748.2747.7.camel@amilo.coursju> References: <1337180748.2747.7.camel@amilo.coursju> Message-ID: On Wed, May 16, 2012 at 11:05 AM, Fabrice Silva wrote: > Hi, > I am getting into troubles when using numpy.testing with coverage. A > minimal example package is atached to this email. Unpack and run: > > $ python -c "import mypackage; mypackage.test(verbose=10,coverage=False)" > $ python -c "import mypackage; mypackage.test(verbose=10,coverage=True)" > > Some explanations: > This package contains two module files, one of them (b.py) defining a > class (mycls), the other one (a.py) importing the first module. I then > add a simple test file, that does instantiate the class through the > other module (a.py) and checking the type of the resulting object > against b.mycls > Without coverage, everything is ok. With coverage, I got a > (unexpected ?) reload of the modules, leading to a mismatch of types... > > The real case is somewhat more complicated, and I would prefer to keep > the instantiation through a.py. Is there a way to solve the problem ? maybe it's this http://thread.gmane.org/gmane.comp.python.numeric.general/49785/focus=49787 It helped in my case, and I don't have a problem running the tests with your mypackage (using my patched numpy) Josef > > Best regards > > -- > Fabrice Silva > LMA UPR CNRS 7051 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From silva at lma.cnrs-mrs.fr Wed May 16 13:01:47 2012 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Wed, 16 May 2012 19:01:47 +0200 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage In-Reply-To: References: <1337180748.2747.7.camel@amilo.coursju> Message-ID: <1337187707.8870.3.camel@amilo.coursju> > maybe it's this > http://thread.gmane.org/gmane.comp.python.numeric.general/49785/focus=49787 Thanks for your reply, Not sure it is the same trouble, I'll have a deeper look at that thread... > It helped in my case, and I don't have a problem running the tests > with your mypackage (using my patched numpy) What I get: $ python -c "import mypackage; mypackage.test(coverage=False, verbose=10)" Reading a.py file... Id(mycls in b) = 168022116 Running unit tests for mypackage NumPy version 1.6.2rc1 NumPy is installed in /usr/lib/pymodules/python2.7/numpy Python version 2.7.3rc2 (default, Apr 22 2012, 22:35:38) [GCC 4.6.3] nose version 1.1.2 [...] test_a.test_instantition ... Id(type(self)) = 168022116 Id(type(obj)) = 168022116 ok ---------------------------------------------------------------------- Ran 1 test in 0.002s OK $ python -c "import mypackage; mypackage.test(coverage=True, verbose=10)" Reading a.py file... Id(mycls in b) = 143573092 Running unit tests for mypackage NumPy version 1.6.2rc1 NumPy is installed in /usr/lib/pymodules/python2.7/numpy Python version 2.7.3rc2 (default, Apr 22 2012, 22:35:38) [GCC 4.6.3] nose version 1.1.2 [...] Reading a.py file... Id(mycls in b) = 150640036 test_a.test_instantition ... Id(type(self)) = 150640036 Id(type(obj)) = 150640036 FAIL ====================================================================== FAIL: test_a.test_instantition ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/tmp/cover/mypackage/tests/test_a.py", line 13, in test_instantition assert a.b.is_mycls(obj) AssertionError [coverage results] Ran 1 test in 0.006s FAILED (failures=1) -- Fabrice Silva LMA UPR CNRS 7051 From njs at pobox.com Wed May 16 13:52:33 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 May 2012 18:52:33 +0100 Subject: [Numpy-discussion] Scipy build can't find BLAS when using numpy master? (Was: Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 4:50 PM, Robert Kern wrote: > On Wed, May 16, 2012 at 4:35 PM, Nathaniel Smith wrote: >> On Wed, May 16, 2012 at 4:24 PM, Robert Kern wrote: >>> On Wed, May 16, 2012 at 4:21 PM, Nathaniel Smith wrote: >>> >>>> I built some pristine python 2.7 installs from scratch (no virtualenv, >>>> no distro tweaks, etc.). Then I installed some version of numpy in >>>> each, then tried building scipy. With numpy 1.6.1 (built from git), >>>> everything seems fine - it finds atlas in /usr/lib64/atlas. >>>> >>>> With current numpy master (3bbbbd416a0a), numpy itself builds fine, >>>> but scipy's setup.py can't find atlas (even though it's looking in the >>>> right places). Log attached. >>> >>> The log says it's looking in /usr/lib64/atlas-base/, not /usr/lib64/atlas/. >> >> Sorry, that was an error in my email. The libraries are actually in >> /usr/lib64/atlas-base, and when I build against numpy 1.6.1, the build >> log says: >> >> Setting PTATLAS=ATLAS >> ?FOUND: >> ? ?libraries = ['ptf77blas', 'ptcblas', 'atlas'] >> ? ?library_dirs = ['/usr/lib64/atlas-base'] >> ? ?language = c >> ? ?define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] >> ? ?include_dirs = ['/usr/include/atlas'] >> >> ?FOUND: >> ? ?libraries = ['ptf77blas', 'ptcblas', 'atlas'] >> ? ?library_dirs = ['/usr/lib64/atlas-base'] >> ? ?language = c >> ? ?define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] >> ? ?include_dirs = ['/usr/include/atlas'] > > Hmm. I would throw in some print statements into the "_check_libs()" > and "_lib_list()" methods in numpy/distutils/system_info.py and see > what it's checking for. It turns out that len(found) == len(expected) is not a good test in situations where you might find *more* than you expected... https://github.com/numpy/numpy/pull/281 - N From njs at pobox.com Wed May 16 14:26:54 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 May 2012 19:26:54 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 3:04 PM, Benjamin Root wrote: > > > On Wed, May 16, 2012 at 9:55 AM, Nathaniel Smith wrote: >> >> On Tue, May 15, 2012 at 2:49 PM, Fr?d?ric Bastien wrote: >> > Hi, >> > >> > In fact, I would arg to never change the current behavior, but add the >> > flag for people that want to use it. >> > >> > Why? >> > >> > 1) There is probably >10k script that use it that will need to be >> > checked for correctness. There won't be easy to see crash or error >> > that allow user to see it. >> >> My suggestion is that we follow the scheme, which I think gives ample >> opportunity for people to notice problems: >> >> 1.7: works like 1.6, except that a DeprecationWarning is produced if >> (and only if) someone writes to an array returned by np.diagonal (or >> friends). This gives a pleasant heads-up for those who pay attention >> to DeprecationWarnings. >> >> 1.8: return a view, but mark this view read-only. This causes crashes >> for anyone who ignored the DeprecationWarnings, guaranteeing that >> they'll notice the issue. >> >> 1.9: return a writeable view, transition complete. >> >> I've written a pull request implementing the first part of this; I >> hope everyone interested will take a look: >> ?https://github.com/numpy/numpy/pull/280 >> >> > 2) This is a globally not significant speed up by this change. Due to >> > 1), i think it is not work it. Why this is not a significant speed up? >> > First, the user already create and use the original tensor. Suppose a >> > matrix of size n x n. If it don't fit in the cache, creating it will >> > cost n * n. But coping it will cost cst * n. The cst is the price of >> > loading a full cache line. But if you return a view, you will pay this >> > cst price later when you do the computation. But it all case, this is >> > cheap compared to the cost of creating the matrix. Also, you will do >> > work on the matrix and this work will be much more costly then the >> > price of the copy. >> > >> > In the case the matrix fix in the cache, the price of the copy is even >> > lower. >> > >> > So in conclusion, optimizing the diagonal won't give speed up in the >> > global user script, but will break many of them. >> >> I agree that the speed difference is small. I'm more worried about the >> cost to users of having to remember odd inconsistencies like this, and >> to think about whether there actually is a speed difference or not, >> etc. (If we do add a copy=False option, then I guarantee many people >> will use it religiously "just in case" the speed difference is enough >> to matter! And that would suck for them.) >> >> Returning a view makes the API slightly nicer, cleaner, more >> consistent, more useful. (I believe the reason this was implemented in >> the first place was that providing a convenient way to *write* to the >> diagonal of an arbitrary array made it easier to implement numpy.eye >> for masked arrays.) And the whole point of numpy is to trade off a >> little speed in favor of having a simple, easy-to-work with high-level >> API :-). >> >> -- Nathaniel > > > Just as a sanity check, do the scipy tests run without producing any such > messages? Yeah, neither the 0.10.1 nor current scipy master tests trigger the DeprecationWarning when run against my branch. -- Nathaniel From ralf.gommers at googlemail.com Wed May 16 15:01:24 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 16 May 2012 21:01:24 +0200 Subject: [Numpy-discussion] Debian/Ubuntu patch help (was: ANN: NumPy 1.6.2 release candidate 1) In-Reply-To: <4FB2BE06.3030204@googlemail.com> References: <4FB2BE06.3030204@googlemail.com> Message-ID: On Tue, May 15, 2012 at 10:35 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > > Hi, if there's anyone wants to have a look at the above issue this > > week, > >that would be great. > > > If there's a patch by this weekend I can create a second RC, so we can > > still have the final release before the end of this month (needed for > > Debian freeze). Otherwise a second RC won't be needed. > > bugfixes are still allowed during the debian freeze, so that should not > be an issue for the release timing. > > OK, that's good to know. So what's the hard deadline then? > > I don't see the issue with the gcc --print-multiarch patch besides maybe > some cleanup. > --print-multiarch is a debian specific gcc patch, but multiarch is > debian specific for now. > > It doesn't work in 11.04, but who cares, that will be end of life in 5 > month anyway. Eh, we (the numpy maintainers) should care. If we would not care about an OS released only 13 months ago, we're not doing our job right. > Besides x11 almost nothing is multiarched in 11.04 anyway > and that can still be covered by the currently existing method. > > gcc should be available for pretty much anything requiring > numpy.distutils anyway so that should be not be an issue. > On systems without --print-multiarch or gcc you just ignore the failing, > there will be no repercussions as there will also not be any multiarched > libraries. > > If it's really that simple, such a patch may go into numpy master. But history has shown that patches to a central part of numpy.distutils are rarely issue-free (more due to the limitations/complexity of distutils than anything else). Therefore making such a change right before a release is simply a bad idea. the only potential issue I see is that upstream gcc adds a > --print-multiarch that does something completely different and harmful > for distutils, but I don't consider that very likely. > > Hardcoding a bunch of paths defeats the whole purpose of multiarch. > It will just to break in future (e.g. when the x32 abi comes to > debian/ubuntu) and will make cross compiling harder (though I guess > numpy distutils may not be built for that anyway) > If that hardcoding will break in the future, then I think for 1.6.2 the maintainers of the Debian package should apply the gcc patch to their packaged numpy if they think that is necessary. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis.jones at curie.fr Wed May 16 15:34:02 2012 From: thouis.jones at curie.fr (Thouis Jones) Date: Wed, 16 May 2012 21:34:02 +0200 Subject: [Numpy-discussion] tracing numpy data allocation with python callbacks Message-ID: I recently had need of tracing numpy data allocation/deallocation. I was unable to find a simple way to do so, and so ended up putting the code below into ndarraytypes.h to allow me to trace allocations. A key part is that this jumps back into python, so I can inspect the stack and find out where in the upper level code large allocations are occurring. I wondered, however, if there were a better way to accomplish the same goal, preferably in pure python. Ray Jones static void *numpy_memtrace_malloc(size_t size) { void *result = malloc(size); PyObject *npy = PyImport_ImportModule("numpy"); if (npy != NULL) { if (PyObject_HasAttrString(npy, "tracemalloc")) { PyObject *ret; ret = PyObject_CallMethod(npy, "tracemalloc", "Nl", PyLong_FromVoidPtr(result), (long) size); Py_XDECREF(ret); Py_DECREF(npy); PyErr_Clear(); } } return result; } static void numpy_memtrace_free(void *ptr) { free(ptr); PyObject *npy = PyImport_ImportModule("numpy"); if (npy != NULL) { if (PyObject_HasAttrString(npy, "tracefree")) { PyObject *ret; ret = PyObject_CallMethod(npy, "tracefree", "N", PyLong_FromVoidPtr(ptr)); Py_XDECREF(ret); Py_DECREF(npy); PyErr_Clear(); } } } static void *numpy_memtrace_realloc(void *ptr, size_t size) { void *result = realloc(ptr, size); PyObject *npy = PyImport_ImportModule("numpy"); if (npy != NULL) { if (PyObject_HasAttrString(npy, "npyealloc")) { PyObject *ret; ret = PyObject_CallMethod(npy, "npyealloc", "NNl", PyLong_FromVoidPtr(ptr), PyLong_FromVoidPtr(result), size); Py_XDECREF(ret); Py_DECREF(npy); PyErr_Clear(); } } return result; } /* Data buffer */ #define PyDataMem_NEW(size) ((char *)numpy_memtrace_malloc(size)) #define PyDataMem_FREE(ptr) numpy_memtrace_free(ptr) #define PyDataMem_RENEW(ptr,size) ((char *)numpy_memtrace_realloc(ptr,size)) From jtaylor.debian at googlemail.com Wed May 16 15:57:36 2012 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 16 May 2012 21:57:36 +0200 Subject: [Numpy-discussion] Debian/Ubuntu patch help In-Reply-To: References: <4FB2BE06.3030204@googlemail.com> Message-ID: <4FB406B0.7050808@googlemail.com> On 05/16/2012 09:01 PM, Ralf Gommers wrote: > > > On Tue, May 15, 2012 at 10:35 PM, Julian Taylor > > > wrote: > > > Hi, if there's anyone wants to have a look at the above issue this > > week, > >that would be great. > > > If there's a patch by this weekend I can create a second RC, so we can > > still have the final release before the end of this month (needed for > > Debian freeze). Otherwise a second RC won't be needed. > > bugfixes are still allowed during the debian freeze, so that should not > be an issue for the release timing. > > OK, that's good to know. So what's the hard deadline then? the release team aims for a freeze in the second half of june, but the number of release critical bugs is still huge so it could still change [0]. The freeze is will probably be 3-6 month long. > > > > I don't see the issue with the gcc --print-multiarch patch besides maybe > some cleanup. > --print-multiarch is a debian specific gcc patch, but multiarch is > debian specific for now. > > It doesn't work in 11.04, but who cares, that will be end of life in 5 > month anyway. > > > Eh, we (the numpy maintainers) should care. If we would not care about > an OS released only 13 months ago, we're not doing our job right. I scanned the list of classes in system_info, the only libraries multiarched in 11.04 on the list are: x11_info xft_info freetype2_info the first one is handled by the existing glob method, the latter two are handled correctly via pkg-config So I don't think there is anything to do for 11.04. 11.10 and 12.04 should also be fine. Wheezy will have multiarched fftw but probably not much more. Though one must also account for backports and future releases will likely have more multiarch ready numerical stuff to allow partial architectures like i386+sse2, x86_64+avx or completely new ones like x32. > > > Besides x11 almost nothing is multiarched in 11.04 anyway > and that can still be covered by the currently existing method. > > gcc should be available for pretty much anything requiring > numpy.distutils anyway so that should be not be an issue. > On systems without --print-multiarch or gcc you just ignore the failing, > there will be no repercussions as there will also not be any multiarched > libraries. > > If it's really that simple, such a patch may go into numpy master. But > history has shown that patches to a central part of numpy.distutils are > rarely issue-free (more due to the limitations/complexity of distutils > than anything else). Therefore making such a change right before a > release is simply a bad idea. I agree its probably a bit late to add it to 1.6.2. There is also no real need to have multiarch handled in this version. The Debian can add the patch to its 1.6.2 package It would be good to have the patch or something equivalent in the next version so upgrading from the package to 1.6.3 or 1.7 will not cause a regression in this respect. > > the only potential issue I see is that upstream gcc adds a > --print-multiarch that does something completely different and harmful > for distutils, but I don't consider that very likely. > > Hardcoding a bunch of paths defeats the whole purpose of multiarch. > It will just to break in future (e.g. when the x32 abi comes to > debian/ubuntu) and will make cross compiling harder (though I guess > numpy distutils may not be built for that anyway) > > > If that hardcoding will break in the future, then I think for 1.6.2 the > maintainers of the Debian package should apply the gcc patch to their > packaged numpy if they think that is necessary. the patch is already applied in Ubuntu 12.04 and will likely be applied again in the next Debian upload. Julian [0] http://lists.debian.org/debian-devel-announce/2012/05/msg00004.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From ralf.gommers at googlemail.com Wed May 16 16:10:58 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 16 May 2012 22:10:58 +0200 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 3:55 PM, Nathaniel Smith wrote: > On Tue, May 15, 2012 at 2:49 PM, Fr?d?ric Bastien wrote: > > Hi, > > > > In fact, I would arg to never change the current behavior, but add the > > flag for people that want to use it. > > > > Why? > > > > 1) There is probably >10k script that use it that will need to be > > checked for correctness. There won't be easy to see crash or error > > that allow user to see it. > > My suggestion is that we follow the scheme, which I think gives ample > opportunity for people to notice problems: > > 1.7: works like 1.6, except that a DeprecationWarning is produced if > (and only if) someone writes to an array returned by np.diagonal (or > friends). This gives a pleasant heads-up for those who pay attention > to DeprecationWarnings. > > 1.8: return a view, but mark this view read-only. This causes crashes > for anyone who ignored the DeprecationWarnings, guaranteeing that > they'll notice the issue. > > 1.9: return a writeable view, transition complete. > > I've written a pull request implementing the first part of this; I > hope everyone interested will take a look: > https://github.com/numpy/numpy/pull/280 > Thanks for doing that. Seems like a good way forward. When the PR gets merged, can you please also open a ticket for this with Milestone 1.8? Then we won't forget to make the required changes for that release. Ralf > > > 2) This is a globally not significant speed up by this change. Due to > > 1), i think it is not work it. Why this is not a significant speed up? > > First, the user already create and use the original tensor. Suppose a > > matrix of size n x n. If it don't fit in the cache, creating it will > > cost n * n. But coping it will cost cst * n. The cst is the price of > > loading a full cache line. But if you return a view, you will pay this > > cst price later when you do the computation. But it all case, this is > > cheap compared to the cost of creating the matrix. Also, you will do > > work on the matrix and this work will be much more costly then the > > price of the copy. > > > > In the case the matrix fix in the cache, the price of the copy is even > lower. > > > > So in conclusion, optimizing the diagonal won't give speed up in the > > global user script, but will break many of them. > > I agree that the speed difference is small. I'm more worried about the > cost to users of having to remember odd inconsistencies like this, and > to think about whether there actually is a speed difference or not, > etc. (If we do add a copy=False option, then I guarantee many people > will use it religiously "just in case" the speed difference is enough > to matter! And that would suck for them.) > > Returning a view makes the API slightly nicer, cleaner, more > consistent, more useful. (I believe the reason this was implemented in > the first place was that providing a convenient way to *write* to the > diagonal of an arbitrary array made it easier to implement numpy.eye > for masked arrays.) And the whole point of numpy is to trade off a > little speed in favor of having a simple, easy-to-work with high-level > API :-). > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed May 16 16:15:10 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 16 May 2012 15:15:10 -0500 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: <72A20838-8A7B-4227-ADF7-4C94873E31DA@continuum.io> On May 13, 2012, at 3:11 AM, Nathaniel Smith wrote: > On Sun, May 13, 2012 at 3:28 AM, Travis Oliphant wrote: >> Another approach would be to introduce a method: >> >> a.diag(copy=False) >> >> and leave a.diagonal() alone. Then, a.diagonal() could be deprecated over >> 2-3 releases. > > This would be a good idea if we didn't already have both > np.diagonal(a) (which is an alias for a.diagonal()) *and* np.diag(a), > which do different things. And the new a.diag() would be different > from the existing np.diag(a)... I don't see how the new a.diag() would be different than np.diag(a) except for view semantics for 2-d arrays. Is this really a problem? -Travis From matrixhasu at gmail.com Wed May 16 16:41:49 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Wed, 16 May 2012 22:41:49 +0200 Subject: [Numpy-discussion] Debian/Ubuntu patch help (was: ANN: NumPy 1.6.2 release candidate 1) In-Reply-To: References: <4FB2BE06.3030204@googlemail.com> Message-ID: Hello, On Wed, May 16, 2012 at 9:01 PM, Ralf Gommers wrote: > > > On Tue, May 15, 2012 at 10:35 PM, Julian Taylor > wrote: >> >> > Hi, if there's anyone wants to have a look at the above issue this >> > week, >> >that would be great. >> >> > If there's a patch by this weekend I can create a second RC, so we can >> > still have the final release before the end of this month (needed for >> > Debian freeze). Otherwise a second RC won't be needed. >> >> bugfixes are still allowed during the debian freeze, so that should not >> be an issue for the release timing. >> > OK, that's good to know. So what's the hard deadline then? second half of June, but it's still not set in stone (but I'd also not expect too much variation) >> I don't see the issue with the gcc --print-multiarch patch besides maybe >> some cleanup. >> --print-multiarch is a debian specific gcc patch, but multiarch is >> debian specific for now. >> >> It doesn't work in 11.04, but who cares, that will be end of life in 5 >> month anyway. > > > Eh, we (the numpy maintainers) should care. If we would not care about an OS > released only 13 months ago, we're not doing our job right. > >> >> Besides x11 almost nothing is multiarched in 11.04 anyway >> and that can still be covered by the currently existing method. >> >> gcc should be available for pretty much anything requiring >> numpy.distutils anyway so that should be not be an issue. >> On systems without --print-multiarch or gcc you just ignore the failing, >> there will be no repercussions as there will also not be any multiarched >> libraries. >> > If it's really that simple, such a patch may go into numpy master. But > history has shown that patches to a central part of numpy.distutils are > rarely issue-free (more due to the limitations/complexity of distutils than > anything else). Therefore making such a change right before a release is > simply a bad idea. > >> the only potential issue I see is that upstream gcc adds a >> --print-multiarch that does something completely different and harmful >> for distutils, but I don't consider that very likely. >> >> Hardcoding a bunch of paths defeats the whole purpose of multiarch. >> It will just to break in future (e.g. when the x32 abi comes to >> debian/ubuntu) and will make cross compiling harder (though I guess >> numpy distutils may not be built for that anyway) > > > If that hardcoding will break in the future, then I think for 1.6.2 the > maintainers of the Debian package should apply the gcc patch to their > packaged numpy if they think that is necessary. Agreed, i'll add back the Debian specific patch. Thanks for your time and analysis - so time to release 1.6.2? :) Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From ralf.gommers at googlemail.com Wed May 16 16:47:14 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 16 May 2012 22:47:14 +0200 Subject: [Numpy-discussion] Debian/Ubuntu patch help (was: ANN: NumPy 1.6.2 release candidate 1) In-Reply-To: References: <4FB2BE06.3030204@googlemail.com> Message-ID: On Wed, May 16, 2012 at 10:41 PM, Sandro Tosi wrote: > Hello, > > On Wed, May 16, 2012 at 9:01 PM, Ralf Gommers > wrote: > > > > > > On Tue, May 15, 2012 at 10:35 PM, Julian Taylor > > wrote: > >> > >> > Hi, if there's anyone wants to have a look at the above issue this > >> > week, > >> >that would be great. > >> > >> > If there's a patch by this weekend I can create a second RC, so we can > >> > still have the final release before the end of this month (needed for > >> > Debian freeze). Otherwise a second RC won't be needed. > >> > >> bugfixes are still allowed during the debian freeze, so that should not > >> be an issue for the release timing. > >> > > OK, that's good to know. So what's the hard deadline then? > > second half of June, but it's still not set in stone (but I'd also not > expect too much variation) > > >> I don't see the issue with the gcc --print-multiarch patch besides maybe > >> some cleanup. > >> --print-multiarch is a debian specific gcc patch, but multiarch is > >> debian specific for now. > >> > >> It doesn't work in 11.04, but who cares, that will be end of life in 5 > >> month anyway. > > > > > > Eh, we (the numpy maintainers) should care. If we would not care about > an OS > > released only 13 months ago, we're not doing our job right. > > > >> > >> Besides x11 almost nothing is multiarched in 11.04 anyway > >> and that can still be covered by the currently existing method. > >> > >> gcc should be available for pretty much anything requiring > >> numpy.distutils anyway so that should be not be an issue. > >> On systems without --print-multiarch or gcc you just ignore the failing, > >> there will be no repercussions as there will also not be any multiarched > >> libraries. > >> > > If it's really that simple, such a patch may go into numpy master. But > > history has shown that patches to a central part of numpy.distutils are > > rarely issue-free (more due to the limitations/complexity of distutils > than > > anything else). Therefore making such a change right before a release is > > simply a bad idea. > > > >> the only potential issue I see is that upstream gcc adds a > >> --print-multiarch that does something completely different and harmful > >> for distutils, but I don't consider that very likely. > >> > >> Hardcoding a bunch of paths defeats the whole purpose of multiarch. > >> It will just to break in future (e.g. when the x32 abi comes to > >> debian/ubuntu) and will make cross compiling harder (though I guess > >> numpy distutils may not be built for that anyway) > > > > > > If that hardcoding will break in the future, then I think for 1.6.2 the > > maintainers of the Debian package should apply the gcc patch to their > > packaged numpy if they think that is necessary. > > Agreed, i'll add back the Debian specific patch. Thanks for your time > and analysis - so time to release 1.6.2? :) > Yes, this weekend probably. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed May 16 16:50:08 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 16 May 2012 22:50:08 +0200 Subject: [Numpy-discussion] Debian/Ubuntu patch help In-Reply-To: <4FB406B0.7050808@googlemail.com> References: <4FB2BE06.3030204@googlemail.com> <4FB406B0.7050808@googlemail.com> Message-ID: On Wed, May 16, 2012 at 9:57 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 05/16/2012 09:01 PM, Ralf Gommers wrote: > > > > > > On Tue, May 15, 2012 at 10:35 PM, Julian Taylor > > > > > wrote: > > > > > Hi, if there's anyone wants to have a look at the above issue this > > > week, > > >that would be great. > > > > > If there's a patch by this weekend I can create a second RC, so we > can > > > still have the final release before the end of this month (needed > for > > > Debian freeze). Otherwise a second RC won't be needed. > > > > bugfixes are still allowed during the debian freeze, so that should > not > > be an issue for the release timing. > > > > OK, that's good to know. So what's the hard deadline then? > > the release team aims for a freeze in the second half of june, but the > number of release critical bugs is still huge so it could still change [0]. > The freeze is will probably be 3-6 month long. > > > > > > > > > I don't see the issue with the gcc --print-multiarch patch besides > maybe > > some cleanup. > > --print-multiarch is a debian specific gcc patch, but multiarch is > > debian specific for now. > > > > It doesn't work in 11.04, but who cares, that will be end of life in > 5 > > month anyway. > > > > > > Eh, we (the numpy maintainers) should care. If we would not care about > > an OS released only 13 months ago, we're not doing our job right. > > I scanned the list of classes in system_info, the only libraries > multiarched in 11.04 on the list are: > x11_info > xft_info > freetype2_info > > the first one is handled by the existing glob method, the latter two are > handled correctly via pkg-config > So I don't think there is anything to do for 11.04. 11.10 and 12.04 > should also be fine. > Wheezy will have multiarched fftw but probably not much more. > > Though one must also account for backports and future releases will > likely have more multiarch ready numerical stuff to allow partial > architectures like i386+sse2, x86_64+avx or completely new ones like x32. > > > > > > > Besides x11 almost nothing is multiarched in 11.04 anyway > > and that can still be covered by the currently existing method. > > > > gcc should be available for pretty much anything requiring > > numpy.distutils anyway so that should be not be an issue. > > On systems without --print-multiarch or gcc you just ignore the > failing, > > there will be no repercussions as there will also not be any > multiarched > > libraries. > > > > If it's really that simple, such a patch may go into numpy master. But > > history has shown that patches to a central part of numpy.distutils are > > rarely issue-free (more due to the limitations/complexity of distutils > > than anything else). Therefore making such a change right before a > > release is simply a bad idea. > > I agree its probably a bit late to add it to 1.6.2. > There is also no real need to have multiarch handled in this version. > The Debian can add the patch to its 1.6.2 package > > It would be good to have the patch or something equivalent in the next > version so upgrading from the package to 1.6.3 or 1.7 will not cause a > regression in this respect. > Yes, and better sooner than later. If you or someone else can provide this as a pull request on Github, that would be helpful. As would a check that the patch doesn't fail on Windows or OS X. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed May 16 16:51:27 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 16 May 2012 15:51:27 -0500 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> This Pull Request looks like a good idea to me as well. -Travis On May 16, 2012, at 3:10 PM, Ralf Gommers wrote: > > > On Wed, May 16, 2012 at 3:55 PM, Nathaniel Smith wrote: > On Tue, May 15, 2012 at 2:49 PM, Fr?d?ric Bastien wrote: > > Hi, > > > > In fact, I would arg to never change the current behavior, but add the > > flag for people that want to use it. > > > > Why? > > > > 1) There is probably >10k script that use it that will need to be > > checked for correctness. There won't be easy to see crash or error > > that allow user to see it. > > My suggestion is that we follow the scheme, which I think gives ample > opportunity for people to notice problems: > > 1.7: works like 1.6, except that a DeprecationWarning is produced if > (and only if) someone writes to an array returned by np.diagonal (or > friends). This gives a pleasant heads-up for those who pay attention > to DeprecationWarnings. > > 1.8: return a view, but mark this view read-only. This causes crashes > for anyone who ignored the DeprecationWarnings, guaranteeing that > they'll notice the issue. > > 1.9: return a writeable view, transition complete. > > I've written a pull request implementing the first part of this; I > hope everyone interested will take a look: > https://github.com/numpy/numpy/pull/280 > > Thanks for doing that. Seems like a good way forward. > > When the PR gets merged, can you please also open a ticket for this with Milestone 1.8? Then we won't forget to make the required changes for that release. > > Ralf > > > > 2) This is a globally not significant speed up by this change. Due to > > 1), i think it is not work it. Why this is not a significant speed up? > > First, the user already create and use the original tensor. Suppose a > > matrix of size n x n. If it don't fit in the cache, creating it will > > cost n * n. But coping it will cost cst * n. The cst is the price of > > loading a full cache line. But if you return a view, you will pay this > > cst price later when you do the computation. But it all case, this is > > cheap compared to the cost of creating the matrix. Also, you will do > > work on the matrix and this work will be much more costly then the > > price of the copy. > > > > In the case the matrix fix in the cache, the price of the copy is even lower. > > > > So in conclusion, optimizing the diagonal won't give speed up in the > > global user script, but will break many of them. > > I agree that the speed difference is small. I'm more worried about the > cost to users of having to remember odd inconsistencies like this, and > to think about whether there actually is a speed difference or not, > etc. (If we do add a copy=False option, then I guarantee many people > will use it religiously "just in case" the speed difference is enough > to matter! And that would suck for them.) > > Returning a view makes the API slightly nicer, cleaner, more > consistent, more useful. (I believe the reason this was implemented in > the first place was that providing a convenient way to *write* to the > diagonal of an arbitrary array made it easier to implement numpy.eye > for masked arrays.) And the whole point of numpy is to trade off a > little speed in favor of having a simple, easy-to-work with high-level > API :-). > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed May 16 16:58:20 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 16 May 2012 22:58:20 +0200 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage In-Reply-To: <1337187707.8870.3.camel@amilo.coursju> References: <1337180748.2747.7.camel@amilo.coursju> <1337187707.8870.3.camel@amilo.coursju> Message-ID: On Wed, May 16, 2012 at 7:01 PM, Fabrice Silva wrote: > > maybe it's this > > > http://thread.gmane.org/gmane.comp.python.numeric.general/49785/focus=49787 > Thanks for your reply, Not sure it is the same trouble, I'll have a > deeper look at that thread... > > Both coverage=True and coverage=False work with your attached package. But it seems you attached an old version, because test_a.py doesn't include the actual test. "obj = a.b.mycls()" in test_a.py executes fine, so it may be a problem with the way you wrote the test case. Ralf > > It helped in my case, and I don't have a problem running the tests > > with your mypackage (using my patched numpy) > > What I get: > $ python -c "import mypackage; mypackage.test(coverage=False, > verbose=10)" > Reading a.py file... > Id(mycls in b) = 168022116 > Running unit tests for mypackage > NumPy version 1.6.2rc1 > NumPy is installed in /usr/lib/pymodules/python2.7/numpy > Python version 2.7.3rc2 (default, Apr 22 2012, 22:35:38) [GCC 4.6.3] > nose version 1.1.2 > [...] > test_a.test_instantition ... Id(type(self)) = 168022116 > Id(type(obj)) = 168022116 > ok > > > ---------------------------------------------------------------------- > Ran 1 test in 0.002s > > OK > $ python -c "import mypackage; mypackage.test(coverage=True, > verbose=10)" > Reading a.py file... > Id(mycls in b) = 143573092 > Running unit tests for mypackage > NumPy version 1.6.2rc1 > NumPy is installed in /usr/lib/pymodules/python2.7/numpy > Python version 2.7.3rc2 (default, Apr 22 2012, 22:35:38) [GCC 4.6.3] > nose version 1.1.2 > [...] > Reading a.py file... > Id(mycls in b) = 150640036 > test_a.test_instantition ... > Id(type(self)) = 150640036 > Id(type(obj)) = 150640036 > FAIL > > > ====================================================================== > FAIL: test_a.test_instantition > > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, > in runTest > self.test(*self.arg) > File "/tmp/cover/mypackage/tests/test_a.py", line 13, in > test_instantition > assert a.b.is_mycls(obj) > AssertionError > > [coverage results] > Ran 1 test in 0.006s > > FAILED (failures=1) > > > > -- > Fabrice Silva > LMA UPR CNRS 7051 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Thu May 17 01:41:07 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 16 May 2012 22:41:07 -0700 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 7:10 AM, Nathaniel Smith wrote: > I tried checking this before, actually, but can't figure out how to > build scipy against a copy of numpy that is installed in either a > virtualenv or just on PYTHONPATH. (Basically, I just don't want to > install some random development numpy into my system python.) Any > suggestions? manage PYTHONPATH as a stack: push install, play/test, pop stack: Here's my bashrc machinery to do it, crib at will: https://gist.github.com/2716714 Usage is trivial: cd numpy && ./setup.py install --prefix=~/tmp/junk cd scipy && ./setup.py install --prefix=~/tmp/junk # play/test rm -rf ~/tmp/junk Cheers, f From fperez.net at gmail.com Thu May 17 01:43:58 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 16 May 2012 22:43:58 -0700 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: Message-ID: On Fri, May 11, 2012 at 4:54 PM, Nathaniel Smith wrote: > I > have lying around my homedir that it would generally be a free speed > win Don't forget the case where the copy semantics may actually provide an *improvement* in performance by allowing a potentially large array to get deallocated if it was local. People often forget that a *single element* that is a view can 'pin' a huge array to memory by INCREF-ing it. If that large array trashes your cache (or worse, makes you go into swapping), the time savings of using a view will be quickly obliterated. There are cases where a copy can be the fast solution... Cheers, f From silva at lma.cnrs-mrs.fr Thu May 17 03:48:02 2012 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Thu, 17 May 2012 09:48:02 +0200 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage In-Reply-To: References: <1337180748.2747.7.camel@amilo.coursju> <1337187707.8870.3.camel@amilo.coursju> Message-ID: <1337240882.20701.2.camel@amilo.coursju> Le mercredi 16 mai 2012 ? 22:58 +0200, Ralf Gommers a ?crit : > Both coverage=True and coverage=False work with your attached package. > But it seems you attached an old version, because test_a.py doesn't > include the actual test. "obj = a.b.mycls()" in test_a.py executes > fine, so it may be a problem with the way you wrote the test case. Sorry, I forgot to update the archive... Here new one that lead (on my machine) to failure with coverage -- Fabrice Silva -------------- next part -------------- A non-text attachment was scrubbed... Name: mypackage.tar Type: application/x-tar Size: 10240 bytes Desc: not available URL: From ralf.gommers at googlemail.com Thu May 17 04:34:14 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 17 May 2012 10:34:14 +0200 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage In-Reply-To: <1337240882.20701.2.camel@amilo.coursju> References: <1337180748.2747.7.camel@amilo.coursju> <1337187707.8870.3.camel@amilo.coursju> <1337240882.20701.2.camel@amilo.coursju> Message-ID: On Thu, May 17, 2012 at 9:48 AM, Fabrice Silva wrote: > Le mercredi 16 mai 2012 ? 22:58 +0200, Ralf Gommers a ?crit : > > > Both coverage=True and coverage=False work with your attached package. > > But it seems you attached an old version, because test_a.py doesn't > > include the actual test. "obj = a.b.mycls()" in test_a.py executes > > fine, so it may be a problem with the way you wrote the test case. > > Sorry, I forgot to update the archive... > Here new one that lead (on my machine) to failure with coverage Still not the right code. Here's the code in b.py: class mycls(object): def __init__(self): print "Id(type(self)) = ",id(type(self)) print "Id(mycls in b) = ", id(mycls) Test fails on "assert a.b.is_mycls(obj)". There's no such thing in b. This has nothing to do with coverage. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From silva at lma.cnrs-mrs.fr Thu May 17 04:48:48 2012 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Thu, 17 May 2012 10:48:48 +0200 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage In-Reply-To: References: <1337180748.2747.7.camel@amilo.coursju> <1337187707.8870.3.camel@amilo.coursju> <1337240882.20701.2.camel@amilo.coursju> Message-ID: <1337244528.20701.4.camel@amilo.coursju> Nautilus and file-roller are *** on me... I hope this one is good. Thanks for being patient :) Best regards -- Fabrice Silva -------------- next part -------------- A non-text attachment was scrubbed... Name: mypackage.tar Type: application/x-tar Size: 10240 bytes Desc: not available URL: From ralf.gommers at googlemail.com Thu May 17 05:16:00 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 17 May 2012 11:16:00 +0200 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage In-Reply-To: <1337244528.20701.4.camel@amilo.coursju> References: <1337180748.2747.7.camel@amilo.coursju> <1337187707.8870.3.camel@amilo.coursju> <1337240882.20701.2.camel@amilo.coursju> <1337244528.20701.4.camel@amilo.coursju> Message-ID: On Thu, May 17, 2012 at 10:48 AM, Fabrice Silva wrote: > Nautilus and file-roller are *** on me... > I hope this one is good. > Thanks for being patient :) > That was the right tar file. The issue was the one Josef was pointing too (cover-inclusive), which has already been fixed in https://github.com/numpy/numpy/commit/bfaaefe52. That option indeed reimports everything, so that "a.b is b" is False. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From silva at lma.cnrs-mrs.fr Thu May 17 05:32:42 2012 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Thu, 17 May 2012 11:32:42 +0200 Subject: [Numpy-discussion] [numpy.testing] re-import when using coverage In-Reply-To: References: <1337180748.2747.7.camel@amilo.coursju> <1337187707.8870.3.camel@amilo.coursju> <1337240882.20701.2.camel@amilo.coursju> <1337244528.20701.4.camel@amilo.coursju> Message-ID: <1337247162.20701.6.camel@amilo.coursju> Le jeudi 17 mai 2012 ? 11:16 +0200, Ralf Gommers a ?crit : > On Thu, May 17, 2012 at 10:48 AM, Fabrice Silva wrote: > > > Nautilus and file-roller are *** on me... > > I hope this one is good. > > Thanks for being patient :) > > > > That was the right tar file. The issue was the one Josef was pointing too > (cover-inclusive), which has already been fixed in > https://github.com/numpy/numpy/commit/bfaaefe52. > > That option indeed reimports everything, so that "a.b is b" is False. That was indeed the issue. Thanks you again (and Josef too) -- Fabrice Silva LMA UPR CNRS 7051 From chris.barker at noaa.gov Thu May 17 11:13:55 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 17 May 2012 08:13:55 -0700 Subject: [Numpy-discussion] fromstring() is slow, no really! In-Reply-To: References: Message-ID: Anthony, Thanks for looking into this. A few other notes about fromstring() ( and fromfile() ). Frankly they haven't gotten much love -- they are, as you have seen, less than optimized, and kind of buggy (actually, not really buggy, but not robust in the face of malformed input -- and they give results that are wrong in some cases (rather throwing an error, for instance). So they realy do need some attention. On the other hand -- folks are working on various ways to optimize reading data from text files (and maybe strings) so that may be a better way to go. If you google "fromstring barker numpy" you'll find a thread or too with what I learned, and pointers to a couple tickets. What I do remember: The use of atof and friends is complicated because there are python version that extend the C lib versions, and numpy versions that extend those (for better NaN handling, for instance). the source of the lack of robustness stems from the fact that the error checking is not done right when calling atof and friends -- i.e. you need to check if the pointer was incrememnted to see if it successfully read a value. With the layered calls to numpy and python versions, I found it very hard to fix this. Profile carefully to check your theory about limited over-allocation of memory being the source of the performance issues -- when i've tested similar code, it made little difference -- allocating and copying memory is actually pretty fast. If you re-allocate an copy every single append, it's slow, yes, but I found virtually no difference between over-allocating say 10% or 50% (not sure what the bottom reasonable value was there) Good luck, -Chris On Sun, May 13, 2012 at 4:28 PM, Anthony Scopatz wrote: > Hello All, > > This week, while doing some optimization, I found that np.fromstring() > is significantly slower than many alternatives out there. ?This function > basically does two things: (1) it splits the string and (2) it converts the > data to the desired type. > > There isn't much we can do about the conversion/casting so what I > mean is that the string splitting?implementation?is slow. > > To simplify?the discussion, I will just talk about string to 1d float64 > arrays. > I have also issued pull request #279 [1] to numpy with some sample code. > Timings can be seen in the ipython notebook here. > > It turns out that using str.split() and np.array() are 20 - 35% faster, > which > was non-intuitive to me. ?That is to say: > > rawdata = s.split() > data = np.array(rawdata, dtype=float) > > > is faster than > > data = np.fromstring(s, sep=" ", dtype=float) > > > The next thing to try, naturally, was Cython. ?This did not change the > timings?much for these two ?strategies. ?However, being in Cython > allows us to call?atof() directly. ?My?implementation?is based on a > previous > thread on this?topic [2]. ? However, in the example in [2], the string was > hard coded, contained only one data value, and did not need to be split. > Thus they saw a dramatic 10x speed boost. ? To deal with the more > realistic case, I first just continued to use str.split(). ?This took 35 - > 50% > less time than np.fromstring(). > > Finally, using the strtok() function in the C standard library to call > atof() > while we tokenize the string further reduces the speed 50 - 60% of the > baseline np.fromstring() time. > > Timings > ------------ > In [1]: import fromstr > > In [2]: s = "100.0 " * 100000 > > In [3]: timeit fromstr.fromstring(s) > 10 loops, best of 3: 20.7 ms per loop > > In [4]: timeit fromstr.split_and_array(s) > 100 loops, best of 3: 16.1 ms per loop > > In [6]: timeit fromstr.split_atof(s) > 100 loops, best of 3: 13.5 ms per loop > > In [7]: timeit fromstr.token_atof(s) > 100 loops, best of 3: 8.35 ms per loop > > Possible?Explanation > ---------------------------------- > Numpy's fromstring() function may be found here [3]. ?However, this code > is a bit?hard to follow but it uses the array_from_text() function [4]. ?On > the > other hand str.split() [5] uses a macro function SPLIT_ADD(). ? The > difference > between these is that I believe that str.split() over-allocates the size of > the > list in a more aggressive way than?array_from_text(). ?This leads to fewer > resizes and thus fewer memory copies. > > This would also explain why the tokenize implementation is the fastest > since > this pre-allocates the maximum possible array size and then slices it down. > No resizes are present in this function, though it requires more memory up > front. > > Summary (tl;dr) > ------------------------ > The np.fromstring() is slow in the mechanism it chooses to split strings by. > > This is likely due to how many resize?operations?it must perform. ?While it > need not be the *fastest* thing out there, it should probably be at least as > fast at Python string splitting. > > No pull-request 'fixing' this issue was provided because I wanted to see > what people thought and if / which option is worth pursuing. > > Be Well > Anthony > > [1]?https://github.com/numpy/numpy/pull/279 > [2]?http://comments.gmane.org/gmane.comp.python.numeric.general/41504 > [3]?https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L3699 > [4]?https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L3418 > [5]?http://svn.python.org/view/python/tags/r271/Objects/stringlib/split.h?view=markup > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From stefan at sun.ac.za Thu May 17 14:50:55 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 17 May 2012 11:50:55 -0700 Subject: [Numpy-discussion] tracing numpy data allocation with python callbacks In-Reply-To: References: Message-ID: On Wed, May 16, 2012 at 12:34 PM, Thouis Jones wrote: > I wondered, however, if there were a better way to accomplish the same > goal, preferably in pure python. Fabien recently posted this; not sure if it addresses your use case: http://fseoane.net/blog/2012/line-by-line-report-of-memory-usage/ St?fan From njs at pobox.com Thu May 17 15:52:32 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 May 2012 20:52:32 +0100 Subject: [Numpy-discussion] tracing numpy data allocation with python callbacks In-Reply-To: References: Message-ID: On Thu, May 17, 2012 at 7:50 PM, St?fan van der Walt wrote: > On Wed, May 16, 2012 at 12:34 PM, Thouis Jones wrote: >> I wondered, however, if there were a better way to accomplish the same >> goal, preferably in pure python. > > Fabien recently posted this; not sure if it addresses your use case: > > http://fseoane.net/blog/2012/line-by-line-report-of-memory-usage/ I'd be wary of that blog's technique... getting accurate/meaningful memory usage from the kernel is *very* complicated. It'll more-or-less work in some cases, but definitely not all. You'll get spurious size changes if you use mmap (or just load new shared libraries), the portion of the heap that holds small objects will have a reported size that only increases, never decreases (even if your actual usage decreases), etc. I'm not saying it's not useful (and it could be somewhat more accurate if it used /proc/self/smaps where available instead of statm), but real heap tracing has a lot of advantages. I can't see any way to trace numpy's allocation/deallocation out of the box. If PyDataMem_* were real dynamically linked functions, you could use various tricks to intercept calls to them, but in fact they're just aliases for malloc/free/realloc, so there's no way to separate out numpy's uses from other heap allocations without recompiling. If there were a compelling reason then I can't see why anyone would object to adding a memory tracing API to numpy... it's not like we call malloc in tight loops, and when disabled it'd just be a single branch per malloc/free. Not sure how generally useful that would be, though. I'd be tempted to just see if I could get by with massif or another "real" heap profiler -- unfortunately the ones I know are C oriented, but might still be useful... - N From thouis at gmail.com Thu May 17 17:18:22 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Thu, 17 May 2012 23:18:22 +0200 Subject: [Numpy-discussion] tracing numpy data allocation with python callbacks In-Reply-To: References: Message-ID: On Thu, May 17, 2012 at 9:52 PM, Nathaniel Smith wrote: > I'd be tempted to just see if I could get by with massif or another > "real" heap profiler -- unfortunately the ones I know are C oriented, > but might still be useful... I got some very useful information from Fabien's technique, which led me to wanting to know more, particularly about spikes in ephemeral allocations, such as for the subexpressions of "a = b * c + d * e". The main benefit to what I posted over heap profilers in general was that it gave a way for tracing at the python level, so I can easily do things like record the allocations (long-lived and ephemeral) by what line in the python code call stack is responsible. Perhaps I'll rewrite it to be cleaner, something like: numpy.trace_data_allocations(malloc_callback=None, free_callback=None, realloc_callback=None) Where "None" turns off tracing of that particular call. It should be efficient when not activated (check a single flag). Ray Jones From d.s.seljebotn at astro.uio.no Thu May 17 18:53:57 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 18 May 2012 00:53:57 +0200 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive Message-ID: <4FB58185.7070500@astro.uio.no> I'm repeating myself a bit, but my previous thread of this ended up being about something else, and also since then I've been on an expedition to the hostile waters of python-dev. I'm crazy enough to believe that I'm proposing a technical solution to alleviate the problems we've faced as a community the past year. No, this will NOT be about NA, and certainly not governance, but do please allow me one paragraph of musings before the meaty stuff. I believe the Achilles heel of NumPy is the C API and the PyArrayObject. The reliance we all have on the NumPy C API means there can in practice only be one "array" type per Python process. This makes people *very* afraid of creative forking or new competing array libraries (since they just can't live in parallel -- like Cython and Pyrex can!), and every new feature has to go into ndarray to fully realise itself. This in turn means that experimentation with new features has to happen within one or a few release cycles, it cannot happen in the wild and by competition and by seeing what works over the course of years before finally making it into upstream. Finally, if any new great idea can really only be implemented decently if it also impacts thousands of users...that's bad both for morale and developer recruitment. The meat: There's already of course been work on making the NumPy C API work through an indirection layer to make a more stable ABI. This is about changing the ideas of how that indirection should happen, so that you could in theory implement the C API independently of NumPy. You could for instance make a "mini-NumPy" that only contains the bare essentials, and load that in the same process as the real NumPy, and use the C API against objects from both libraries. I'll assume that we can get a PEP through by waving a magic wand, since that makes it easier to focus on essentials. There's many ugly or less ugly hacks to make it work on any existing CPython [1], and they wouldn't be so ugly if there's PEP blessing for the general idea. Imagine if PyTypeObject grew an extra pointer "tp_customslots", which pointed to an array of these: typedef struct { unsigned long tpe_id; void *tpe_data; } PyTypeObjectCustomSlot; The ID space is partitioned to anyone who asks, and NumPy is given a large chunk. To insert a "custom slot", you stick it in this list. And you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type will typically have 0-3 entries so the search is very fast). I've benchmarked something very similar recently, and the overhead in a "hot" situation is on the order of 4-6 cycles. (As for cache, you can at least stick the slot array right next to the type object in memory.) Now, a NumPy array would populate this list with 1-2 entries pointing to tables of function pointers for the NumPy C API. This lookup through the PyTypeObject would in part replace the current import_array() mechanism. I'd actually propose two such custom slots for ndarray for starters: a) One PEP 3118-like binary description that exposes raw data pointers (without the PEP 3118 red tape) b) A function pointer table for a suitable subset of the NumPy C API (obviously not array construction and so on) The all-important PyArray_DATA/DIMS/... would be macros that try for a) first, but fall back to b). Things like PyArray_Check would actually check for support of these slots, "duck typing", rather than the Python type (of course, this could only be done at a major revision like NumPy 2.0 or 3.0). The overhead should be on the order of 5 cycles per C API call. That should be fine for anything but the use of PyArray_DATA inside a tight loop (which is a bad idea anyway). For now I just want to establish if there's support for this general idea, and see if I can get some weight behind a PEP (and ideally a co-author), which would make this a general approach and something more than an ugly NumPy specific hack. We'd also have good use for such a PEP in Cython (and, I believe, Numba/SciPy in CEP 1000). Dag [1] There's many ways of doing similar things in current Python, such as standardising across many participating projects on using a common metaclass. Here's another alternative that doesn't add such inter-project dependencies but is more renegade: http://wiki.cython.org/enhancements/cep1001 From d.s.seljebotn at astro.uio.no Fri May 18 00:55:53 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 18 May 2012 06:55:53 +0200 Subject: [Numpy-discussion] =?utf-8?q?pre-PEP_for_making_creative_forking_?= =?utf-8?q?of_NumPy=09less_destructive?= In-Reply-To: <4FB58185.7070500@astro.uio.no> References: <4FB58185.7070500@astro.uio.no> Message-ID: <9ea73fdb-01c0-4f4c-ac81-3205014b737e@email.android.com> Dag Sverre Seljebotn wrote: >I'm repeating myself a bit, but my previous thread of this ended up >being about something else, and also since then I've been on an >expedition to the hostile waters of python-dev. > >I'm crazy enough to believe that I'm proposing a technical solution to >alleviate the problems we've faced as a community the past year. No, >this will NOT be about NA, and certainly not governance, but do please >allow me one paragraph of musings before the meaty stuff. > >I believe the Achilles heel of NumPy is the C API and the >PyArrayObject. >The reliance we all have on the NumPy C API means there can in practice > >only be one "array" type per Python process. This makes people *very* >afraid of creative forking or new competing array libraries (since they > >just can't live in parallel -- like Cython and Pyrex can!), and every >new feature has to go into ndarray to fully realise itself. This in >turn >means that experimentation with new features has to happen within one >or >a few release cycles, it cannot happen in the wild and by competition >and by seeing what works over the course of years before finally making > >it into upstream. Finally, if any new great idea can really only be >implemented decently if it also impacts thousands of users...that's bad > >both for morale and developer recruitment. > >The meat: > >There's already of course been work on making the NumPy C API work >through an indirection layer to make a more stable ABI. This is about >changing the ideas of how that indirection should happen, so that you >could in theory implement the C API independently of NumPy. > >You could for instance make a "mini-NumPy" that only contains the bare >essentials, and load that in the same process as the real NumPy, and >use >the C API against objects from both libraries. > >I'll assume that we can get a PEP through by waving a magic wand, since > >that makes it easier to focus on essentials. There's many ugly or less >ugly hacks to make it work on any existing CPython [1], and they >wouldn't be so ugly if there's PEP blessing for the general idea. > >Imagine if PyTypeObject grew an extra pointer "tp_customslots", which >pointed to an array of these: > >typedef struct { > unsigned long tpe_id; > void *tpe_data; >} PyTypeObjectCustomSlot; > >The ID space is partitioned to anyone who asks, and NumPy is given a >large chunk. To insert a "custom slot", you stick it in this list. And >you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type >will typically have 0-3 entries so the search is very fast). > >I've benchmarked something very similar recently, and the overhead in a > >"hot" situation is on the order of 4-6 cycles. (As for cache, you can >at >least stick the slot array right next to the type object in memory.) > >Now, a NumPy array would populate this list with 1-2 entries pointing >to >tables of function pointers for the NumPy C API. This lookup through >the >PyTypeObject would in part replace the current import_array() >mechanism. > >I'd actually propose two such custom slots for ndarray for starters: > >a) One PEP 3118-like binary description that exposes raw data pointers To be more clear: the custom-slot in the pytypeobject would contain an offset that you could add to your PyObject* to get to this information. Dag >(without the PEP 3118 red tape) > > b) A function pointer table for a suitable subset of the NumPy C API >(obviously not array construction and so on) > >The all-important PyArray_DATA/DIMS/... would be macros that try for a) > >first, but fall back to b). Things like PyArray_Check would actually >check for support of these slots, "duck typing", rather than the Python > >type (of course, this could only be done at a major revision like NumPy > >2.0 or 3.0). > >The overhead should be on the order of 5 cycles per C API call. That >should be fine for anything but the use of PyArray_DATA inside a tight >loop (which is a bad idea anyway). > >For now I just want to establish if there's support for this general >idea, and see if I can get some weight behind a PEP (and ideally a >co-author), which would make this a general approach and something more > >than an ugly NumPy specific hack. We'd also have good use for such a >PEP >in Cython (and, I believe, Numba/SciPy in CEP 1000). > >Dag > >[1] There's many ways of doing similar things in current Python, such >as >standardising across many participating projects on using a common >metaclass. Here's another alternative that doesn't add such >inter-project dependencies but is more renegade: >http://wiki.cython.org/enhancements/cep1001 >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Fri May 18 02:11:53 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 18 May 2012 08:11:53 +0200 Subject: [Numpy-discussion] =?utf-8?q?pre-PEP_for_making_creative_forking_?= =?utf-8?q?of_NumPy=09less_destructive?= In-Reply-To: <4FB58185.7070500@astro.uio.no> References: <4FB58185.7070500@astro.uio.no> Message-ID: <042dc092-9f03-43e4-a70c-c1107b7c8964@email.android.com> Dag Sverre Seljebotn wrote: >I'm repeating myself a bit, but my previous thread of this ended up >being about something else, and also since then I've been on an >expedition to the hostile waters of python-dev. > >I'm crazy enough to believe that I'm proposing a technical solution to >alleviate the problems we've faced as a community the past year. No, >this will NOT be about NA, and certainly not governance, but do please >allow me one paragraph of musings before the meaty stuff. > >I believe the Achilles heel of NumPy is the C API and the >PyArrayObject. >The reliance we all have on the NumPy C API means there can in practice > >only be one "array" type per Python process. This makes people *very* >afraid of creative forking or new competing array libraries (since they > >just can't live in parallel -- like Cython and Pyrex can!), and every >new feature has to go into ndarray to fully realise itself. This in >turn >means that experimentation with new features has to happen within one >or >a few release cycles, it cannot happen in the wild and by competition >and by seeing what works over the course of years before finally making > >it into upstream. Finally, if any new great idea can really only be >implemented decently if it also impacts thousands of users...that's bad > >both for morale and developer recruitment. Sorry for using the F-word so much, it is not accurate. What I mean is complementary array features like resizeable arrays, arrays tuned for small n, arrays with metadata, lazy arrays, missing data concepts.. Today you can't just make a new array class and stick it on PyPI, because it wouldn't work seamlessly with C extensions. And that means, since it must be part of ndarray, you can't distribute it easily to others. So you can't depend on it for your own work and give it real testing without modifying ndarray, and that means everything new must be perfect on the first try, which is damaging. Dag > >The meat: > >There's already of course been work on making the NumPy C API work >through an indirection layer to make a more stable ABI. This is about >changing the ideas of how that indirection should happen, so that you >could in theory implement the C API independently of NumPy. > >You could for instance make a "mini-NumPy" that only contains the bare >essentials, and load that in the same process as the real NumPy, and >use >the C API against objects from both libraries. > >I'll assume that we can get a PEP through by waving a magic wand, since > >that makes it easier to focus on essentials. There's many ugly or less >ugly hacks to make it work on any existing CPython [1], and they >wouldn't be so ugly if there's PEP blessing for the general idea. > >Imagine if PyTypeObject grew an extra pointer "tp_customslots", which >pointed to an array of these: > >typedef struct { > unsigned long tpe_id; > void *tpe_data; >} PyTypeObjectCustomSlot; > >The ID space is partitioned to anyone who asks, and NumPy is given a >large chunk. To insert a "custom slot", you stick it in this list. And >you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type >will typically have 0-3 entries so the search is very fast). > >I've benchmarked something very similar recently, and the overhead in a > >"hot" situation is on the order of 4-6 cycles. (As for cache, you can >at >least stick the slot array right next to the type object in memory.) > >Now, a NumPy array would populate this list with 1-2 entries pointing >to >tables of function pointers for the NumPy C API. This lookup through >the >PyTypeObject would in part replace the current import_array() >mechanism. > >I'd actually propose two such custom slots for ndarray for starters: > >a) One PEP 3118-like binary description that exposes raw data pointers >(without the PEP 3118 red tape) > > b) A function pointer table for a suitable subset of the NumPy C API >(obviously not array construction and so on) > >The all-important PyArray_DATA/DIMS/... would be macros that try for a) > >first, but fall back to b). Things like PyArray_Check would actually >check for support of these slots, "duck typing", rather than the Python > >type (of course, this could only be done at a major revision like NumPy > >2.0 or 3.0). > >The overhead should be on the order of 5 cycles per C API call. That >should be fine for anything but the use of PyArray_DATA inside a tight >loop (which is a bad idea anyway). > >For now I just want to establish if there's support for this general >idea, and see if I can get some weight behind a PEP (and ideally a >co-author), which would make this a general approach and something more > >than an ugly NumPy specific hack. We'd also have good use for such a >PEP >in Cython (and, I believe, Numba/SciPy in CEP 1000). > >Dag > >[1] There's many ways of doing similar things in current Python, such >as >standardising across many participating projects on using a common >metaclass. Here's another alternative that doesn't add such >inter-project dependencies but is more renegade: >http://wiki.cython.org/enhancements/cep1001 >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From markflorisson88 at gmail.com Fri May 18 07:48:18 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 18 May 2012 12:48:18 +0100 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive In-Reply-To: <4FB58185.7070500@astro.uio.no> References: <4FB58185.7070500@astro.uio.no> Message-ID: On 17 May 2012 23:53, Dag Sverre Seljebotn wrote: > I'm repeating myself a bit, but my previous thread of this ended up > being about something else, and also since then I've been on an > expedition to the hostile waters of python-dev. > > I'm crazy enough to believe that I'm proposing a technical solution to > alleviate the problems we've faced as a community the past year. No, > this will NOT be about NA, and certainly not governance, but do please > allow me one paragraph of musings before the meaty stuff. > > I believe the Achilles heel of NumPy is the C API and the PyArrayObject. > The reliance we all have on the NumPy C API means there can in practice > only be one "array" type per Python process. This makes people *very* > afraid of creative forking or new competing array libraries (since they > just can't live in parallel -- like Cython and Pyrex can!), and every > new feature has to go into ndarray to fully realise itself. This in turn > means that experimentation with new features has to happen within one or > a few release cycles, it cannot happen in the wild and by competition > and by seeing what works over the course of years before finally making > it into upstream. Finally, if any new great idea can really only be > implemented decently if it also impacts thousands of users...that's bad > both for morale and developer recruitment. > > The meat: > > There's already of course been work on making the NumPy C API work > through an indirection layer to make a more stable ABI. This is about > changing the ideas of how that indirection should happen, so that you > could in theory implement the C API independently of NumPy. > > You could for instance make a "mini-NumPy" that only contains the bare > essentials, and load that in the same process as the real NumPy, and use > the C API against objects from both libraries. > > I'll assume that we can get a PEP through by waving a magic wand, since > that makes it easier to focus on essentials. There's many ugly or less > ugly hacks to make it work on any existing CPython [1], and they > wouldn't be so ugly if there's PEP blessing for the general idea. > > Imagine if PyTypeObject grew an extra pointer "tp_customslots", which > pointed to an array of these: > > typedef struct { > ? ? unsigned long tpe_id; > ? ? void *tpe_data; > } PyTypeObjectCustomSlot; > > The ID space is partitioned to anyone who asks, and NumPy is given a > large chunk. To insert a "custom slot", you stick it in this list. And > you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type > will typically have 0-3 entries so the search is very fast). > > I've benchmarked something very similar recently, and the overhead in a > "hot" situation is on the order of 4-6 cycles. (As for cache, you can at > least stick the slot array right next to the type object in memory.) > > Now, a NumPy array would populate this list with 1-2 entries pointing to > tables of function pointers for the NumPy C API. This lookup through the > PyTypeObject would in part replace the current import_array() mechanism. > > I'd actually propose two such custom slots for ndarray for starters: > > ?a) One PEP 3118-like binary description that exposes raw data pointers > (without the PEP 3118 red tape) > > ?b) A function pointer table for a suitable subset of the NumPy C API > (obviously not array construction and so on) > > The all-important PyArray_DATA/DIMS/... would be macros that try for a) > first, but fall back to b). Things like PyArray_Check would actually > check for support of these slots, "duck typing", rather than the Python > type (of course, this could only be done at a major revision like NumPy > 2.0 or 3.0). > > The overhead should be on the order of 5 cycles per C API call. That > should be fine for anything but the use of PyArray_DATA inside a tight > loop (which is a bad idea anyway). > > For now I just want to establish if there's support for this general > idea, and see if I can get some weight behind a PEP (and ideally a > co-author), which would make this a general approach and something more > than an ugly NumPy specific hack. We'd also have good use for such a PEP > in Cython (and, I believe, Numba/SciPy in CEP 1000). Well, you have my vote, but you already knew that. I'd also be willing to co-author any PEP etc, but I'm sensing it may be more useful to have support from people from different projects. Personally, I think if this is to succeed, we first need to fix the design to work for subclasses (I think one may just want to memcpy the interface information over to the subclass, e.g. through a convenience function that allows one to add more as well). If we have a solid idea of the technical implementation, we should actually implement it and present the benchmarks, comparing the results to capsules as attributes (and to the _PyType_Lookup approach). If we have come that far, then even at that point people may argue that one could do the same thing through a metaclass, and that this is too project specific. If people don't care for the standardization, then we could just keep this as a "scientific community proposal". That would be somewhat disappointing as it hurts standardization, adoption and elegance. BTW, if we propose it, it would probably be a good idea to have concrete examples of how non-standardization, e.g. PyCUDA/PyOpenCL, Theano, Numpy, GNumpy, etc have different arrays. See http://deeplearning.net/software/theano/tutorial/gpu_data_convert.html , it may help drive the point through. (Of course GPU arrays can't be handled through a numpy-standardized duck-typing interface, but the ones that live in main memory can. Furthermore, this same proposal can help standardize conversion and operations between GPU arrays as well). If we can find even more examples, preferably outside of the scientific community, where related projects face a similar situation, it may help people understand that this is not "a Numpy problem". > Dag > > [1] There's many ways of doing similar things in current Python, such as > standardising across many participating projects on using a common > metaclass. Here's another alternative that doesn't add such > inter-project dependencies but is more renegade: > http://wiki.cython.org/enhancements/cep1001 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From heng at cantab.net Fri May 18 07:56:04 2012 From: heng at cantab.net (Henry Gomersall) Date: Fri, 18 May 2012 12:56:04 +0100 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive In-Reply-To: References: <4FB58185.7070500@astro.uio.no> Message-ID: <1337342164.9661.24.camel@farnsworth> On Fri, 2012-05-18 at 12:48 +0100, mark florisson wrote: > If we can find even more examples, preferably outside of the > scientific community, where related projects face a similar situation, > it may help people understand that this is not "a Numpy problem". Buffer Objects in OpenGL? From d.s.seljebotn at astro.uio.no Fri May 18 08:45:51 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 18 May 2012 14:45:51 +0200 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive In-Reply-To: <1337342164.9661.24.camel@farnsworth> References: <4FB58185.7070500@astro.uio.no> <1337342164.9661.24.camel@farnsworth> Message-ID: <5d3a71d1-0f95-422f-a18c-58ff574052c4@email.android.com> Henry Gomersall wrote: >On Fri, 2012-05-18 at 12:48 +0100, mark florisson wrote: >> If we can find even more examples, preferably outside of the >> scientific community, where related projects face a similar >situation, >> it may help people understand that this is not "a Numpy problem". > >Buffer Objects in OpenGL? There is already PEP 3118 though. I would focus on the 'polymorphic C API' spin. PyObject_GetItem is polymorphic, but there is no standard way for 3rd party libraries to make such functions. So let's find a C API that's NOT about arrays at all and show how some polymorphism may help. Of course there's a couple of non-array Cython usecases too. Dag > >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From heng at cantab.net Fri May 18 11:00:01 2012 From: heng at cantab.net (Henry Gomersall) Date: Fri, 18 May 2012 16:00:01 +0100 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive In-Reply-To: <5d3a71d1-0f95-422f-a18c-58ff574052c4@email.android.com> References: <4FB58185.7070500@astro.uio.no> <1337342164.9661.24.camel@farnsworth> <5d3a71d1-0f95-422f-a18c-58ff574052c4@email.android.com> Message-ID: <1337353201.11994.8.camel@farnsworth> On Fri, 2012-05-18 at 14:45 +0200, Dag Sverre Seljebotn wrote: > I would focus on the 'polymorphic C API' spin. PyObject_GetItem is > polymorphic, but there is no standard way for 3rd party libraries to > make such functions. > > So let's find a C API that's NOT about arrays at all and show how some > polymorphism may help. > so memory mapped IO or something? From d.s.seljebotn at astro.uio.no Fri May 18 13:59:51 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 18 May 2012 19:59:51 +0200 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive In-Reply-To: <1337353201.11994.8.camel@farnsworth> References: <4FB58185.7070500@astro.uio.no> <1337342164.9661.24.camel@farnsworth> <5d3a71d1-0f95-422f-a18c-58ff574052c4@email.android.com> <1337353201.11994.8.camel@farnsworth> Message-ID: <4FB68E17.2010904@astro.uio.no> On 05/18/2012 05:00 PM, Henry Gomersall wrote: > On Fri, 2012-05-18 at 14:45 +0200, Dag Sverre Seljebotn wrote: >> I would focus on the 'polymorphic C API' spin. PyObject_GetItem is >> polymorphic, but there is no standard way for 3rd party libraries to >> make such functions. >> >> So let's find a C API that's NOT about arrays at all and show how some >> polymorphism may help. >> > so memory mapped IO or something? A C API towards a Python extension that's in wide use; an analog to the NumPy C API in a different domain. Something like a web server or database Python extension module which also exports a C API for writing other Python extension modules against it. I'm not even sure if such a thing exists, in which case NumPy would indeed be a special case. Dag From robert.kern at gmail.com Fri May 18 14:15:26 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 18 May 2012 19:15:26 +0100 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive In-Reply-To: <4FB68E17.2010904@astro.uio.no> References: <4FB58185.7070500@astro.uio.no> <1337342164.9661.24.camel@farnsworth> <5d3a71d1-0f95-422f-a18c-58ff574052c4@email.android.com> <1337353201.11994.8.camel@farnsworth> <4FB68E17.2010904@astro.uio.no> Message-ID: On Fri, May 18, 2012 at 6:59 PM, Dag Sverre Seljebotn wrote: > On 05/18/2012 05:00 PM, Henry Gomersall wrote: >> On Fri, 2012-05-18 at 14:45 +0200, Dag Sverre Seljebotn wrote: >>> I would focus on the 'polymorphic C API' spin. PyObject_GetItem is >>> polymorphic, but there is no standard way for 3rd party libraries to >>> make such functions. >>> >>> So let's find a C API that's NOT about arrays at all and show how some >>> polymorphism may help. >>> >> so memory mapped IO or something? > > A C API towards a Python extension that's in wide use; an analog to the > NumPy C API in a different domain. Something like a web server or > database Python extension module which also exports a C API for writing > other Python extension modules against it. > > I'm not even sure if such a thing exists, in which case NumPy would > indeed be a special case. numpy *is* pretty unique in this regard. The need for this style of polymorphism at this level is even rarer. -- Robert Kern From travis at continuum.io Fri May 18 17:47:12 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 18 May 2012 16:47:12 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 Message-ID: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Hey all, After reading all the discussion around masked arrays and getting input from as many people as possible, it is clear that there is still disagreement about what to do, but there have been some fruitful discussions that ensued. This isn't really new as there was significant disagreement about what to do when the masked array code was initially checked in to master. So, in order to move forward, Mark and I are going to work together with whomever else is willing to help with an effort that is in the spirit of my third proposal but has a few adjustments. The idea will be fleshed out in more detail as it progresses, but the basic concept is to create an (experimental) ndmasked object in NumPy 1.7 and leave the actual ndarray object unchanged. While the details need to be worked out here, a goal is to have the C-API work with both ndmasked arrays and arrayobjects (possibly by defining a base-class C-level structure that both ndarrays inherit from). This might also be a good way for Dag to experiment with his ideas as well but that is not an explicit goal. One way this could work, for example is to have PyArrayObject * be the base-class array (essentially the same C-structure we have now with a HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * as well but add more members to the C-structure. I think this is the easiest thing to do and requires the least amount of code-change. It is also possible to define an abstract base-class PyArrayObject * that both ndarray and ndmasked inherit from. That way ndarray and ndmasked are siblings even though the ndarray would essentially *be* the PyArrayObject * --- just with a different type-hierarchy on the python side. This work will take some time and, therefore, I don't expect 1.7 to be released prior to SciPy Austin with an end of June target date. The timing will largely depend on what time is available from people interested in resolving the situation. Mark and I will have some availability for this work in June but not a great deal (about 2 man-weeks total between us). If there are others who can step in and help, it will help accelerate the process. Best regards, -Travis From chaoyuejoy at gmail.com Fri May 18 17:49:56 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 18 May 2012 23:49:56 +0200 Subject: [Numpy-discussion] python import question Message-ID: Dear all, This is only a small python import question. I think I'm right but just want some confirmation. Previously I have installed numpy 1.5.1. and then I used pip install --upgrade numpy to install numpy 1.6.1 But when I try to import numpy as np within ipython shell, I still get the version 1.5.1 then I checked my sys.path: In [21]: sys.path Out[21]: ['', '/usr/local/bin', '/usr/local/lib/python2.7/dist-packages/pupynere-1.0.15-py2.7.egg', '/usr/lib/pymodules/python2.7', '/usr/local/lib/python2.7/dist-packages/scikits.statsmodels-0.3.1-py2.7.egg', '/usr/local/lib/python2.7/dist-packages/Shapely-1.2.13-py2.7-linux-i686.egg', '/usr/local/lib/python2.7/dist-packages/pandas-0.7.3-py2.7-linux-i686.egg', '/home/chaoyue/python/python_lib', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PIL', '/usr/lib/pymodules/python2.7/gtk-2.0', '/usr/lib/python2.7/dist-packages/gst-0.10', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7/ubuntuone-client', '/usr/lib/pymodules/python2.7/ubuntuone-control-panel', '/usr/lib/pymodules/python2.7/ubuntuone-storage-protocol', '/usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode', '/usr/local/lib/python2.7/dist-packages/IPython/extensions'] Actually I found I have numpy 1.5.1 in /usr/lib/pymodules/python2.7 and numpy 1.6.1 in /usr/local/lib/python2.7/dist-packages/numpy/ but because the first path is before the second one in sys.path, so ipython imports only the first one and ignore the second one. Then I delete the directory of /usr/lib/pymodules/python2.7/numpy and redo the import, I get the version 1.6.1 This means that import will try to find the first occurrence of the module and will ignore the ones with same name in later occurrences? cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From doutriaux1 at llnl.gov Fri May 18 17:54:20 2012 From: doutriaux1 at llnl.gov (Doutriaux, Charles) Date: Fri, 18 May 2012 14:54:20 -0700 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: Travis, We have a significant user base for masked arrays, with a lot of "real-life" experience, use-cases and data. We would really like to get involved on this, please keep us in the loop. C. On 5/18/12 2:47 PM, "Travis Oliphant" wrote: >Hey all, > >After reading all the discussion around masked arrays and getting input >from as many people as possible, it is clear that there is still >disagreement about what to do, but there have been some fruitful >discussions that ensued. > >This isn't really new as there was significant disagreement about what to >do when the masked array code was initially checked in to master. So, >in order to move forward, Mark and I are going to work together with >whomever else is willing to help with an effort that is in the spirit of >my third proposal but has a few adjustments. > >The idea will be fleshed out in more detail as it progresses, but the >basic concept is to create an (experimental) ndmasked object in NumPy 1.7 >and leave the actual ndarray object unchanged. While the details need >to be worked out here, a goal is to have the C-API work with both >ndmasked arrays and arrayobjects (possibly by defining a base-class >C-level structure that both ndarrays inherit from). This might also >be a good way for Dag to experiment with his ideas as well but that is >not an explicit goal. > >One way this could work, for example is to have PyArrayObject * be the >base-class array (essentially the same C-structure we have now with a >HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject >* as well but add more members to the C-structure. I think this is >the easiest thing to do and requires the least amount of code-change. > It is also possible to define an abstract base-class PyArrayObject * >that both ndarray and ndmasked inherit from. That way ndarray and >ndmasked are siblings even though the ndarray would essentially *be* the >PyArrayObject * --- just with a different type-hierarchy on the python >side. > >This work will take some time and, therefore, I don't expect 1.7 to be >released prior to SciPy Austin with an end of June target date. The >timing will largely depend on what time is available from people >interested in resolving the situation. Mark and I will have some >availability for this work in June but not a great deal (about 2 >man-weeks total between us). If there are others who can step in and >help, it will help accelerate the process. > >Best regards, > >-Travis > > > >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Fri May 18 18:25:23 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 18 May 2012 17:25:23 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: Message-ID: <136BDDDC-30ED-4127-A3D1-08DA2D9A5B4E@continuum.io> The best way to keep in the loop is to comment on this list and pay attention to threads that discuss it. Thank you for speaking up, as I was aware of your significant use of the current masked array in NumPy, but it is good when you can articulate your use-cases and APIs that are helpful or annoying to you. -Travis On May 18, 2012, at 4:54 PM, Doutriaux, Charles wrote: > Travis, > > We have a significant user base for masked arrays, with a lot of > "real-life" experience, use-cases and data. > > We would really like to get involved on this, please keep us in the loop. > > C. > > > On 5/18/12 2:47 PM, "Travis Oliphant" wrote: > >> Hey all, >> >> After reading all the discussion around masked arrays and getting input >> from as many people as possible, it is clear that there is still >> disagreement about what to do, but there have been some fruitful >> discussions that ensued. >> >> This isn't really new as there was significant disagreement about what to >> do when the masked array code was initially checked in to master. So, >> in order to move forward, Mark and I are going to work together with >> whomever else is willing to help with an effort that is in the spirit of >> my third proposal but has a few adjustments. >> >> The idea will be fleshed out in more detail as it progresses, but the >> basic concept is to create an (experimental) ndmasked object in NumPy 1.7 >> and leave the actual ndarray object unchanged. While the details need >> to be worked out here, a goal is to have the C-API work with both >> ndmasked arrays and arrayobjects (possibly by defining a base-class >> C-level structure that both ndarrays inherit from). This might also >> be a good way for Dag to experiment with his ideas as well but that is >> not an explicit goal. >> >> One way this could work, for example is to have PyArrayObject * be the >> base-class array (essentially the same C-structure we have now with a >> HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject >> * as well but add more members to the C-structure. I think this is >> the easiest thing to do and requires the least amount of code-change. >> It is also possible to define an abstract base-class PyArrayObject * >> that both ndarray and ndmasked inherit from. That way ndarray and >> ndmasked are siblings even though the ndarray would essentially *be* the >> PyArrayObject * --- just with a different type-hierarchy on the python >> side. >> >> This work will take some time and, therefore, I don't expect 1.7 to be >> released prior to SciPy Austin with an end of June target date. The >> timing will largely depend on what time is available from people >> interested in resolving the situation. Mark and I will have some >> availability for this work in June but not a great deal (about 2 >> man-weeks total between us). If there are others who can step in and >> help, it will help accelerate the process. >> >> Best regards, >> >> -Travis >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Fri May 18 21:10:24 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 18 May 2012 21:10:24 -0400 Subject: [Numpy-discussion] python import question In-Reply-To: References: Message-ID: On Friday, May 18, 2012, Chao YUE wrote: > Dear all, > > This is only a small python import question. I think I'm right but just > want some confirmation. > > Previously I have installed numpy 1.5.1. and then I used pip install > --upgrade numpy > to install numpy 1.6.1 > > But when I try to import numpy as np within ipython shell, I still get the > version 1.5.1 > > then I checked my sys.path: > > In [21]: sys.path > Out[21]: > ['', > '/usr/local/bin', > '/usr/local/lib/python2.7/dist-packages/pupynere-1.0.15-py2.7.egg', > '/usr/lib/pymodules/python2.7', > > '/usr/local/lib/python2.7/dist-packages/scikits.statsmodels-0.3.1-py2.7.egg', > > '/usr/local/lib/python2.7/dist-packages/Shapely-1.2.13-py2.7-linux-i686.egg', > > '/usr/local/lib/python2.7/dist-packages/pandas-0.7.3-py2.7-linux-i686.egg', > '/home/chaoyue/python/python_lib', > '/usr/lib/python2.7', > '/usr/lib/python2.7/plat-linux2', > '/usr/lib/python2.7/lib-tk', > '/usr/lib/python2.7/lib-old', > '/usr/lib/python2.7/lib-dynload', > '/usr/local/lib/python2.7/dist-packages', > '/usr/lib/python2.7/dist-packages', > '/usr/lib/python2.7/dist-packages/PIL', > '/usr/lib/pymodules/python2.7/gtk-2.0', > '/usr/lib/python2.7/dist-packages/gst-0.10', > '/usr/lib/python2.7/dist-packages/gtk-2.0', > '/usr/lib/pymodules/python2.7/ubuntuone-client', > '/usr/lib/pymodules/python2.7/ubuntuone-control-panel', > '/usr/lib/pymodules/python2.7/ubuntuone-storage-protocol', > '/usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode', > '/usr/local/lib/python2.7/dist-packages/IPython/extensions'] > > Actually I found I have numpy 1.5.1 in /usr/lib/pymodules/python2.7 > > and numpy 1.6.1 in /usr/local/lib/python2.7/dist-packages/numpy/ > > but because the first path is before the second one in sys.path, so > ipython imports only the first one and ignore the second one. > Then I delete the directory of /usr/lib/pymodules/python2.7/numpy and redo > the import, I get the version 1.6.1 > > This means that import will try to find the first occurrence of the module > and will ignore the ones with same name in later occurrences? > > cheers, > > Chao > > Yes. This is actually very common. The $PATH environment variable works the same way for finding executables. Ben Root > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Sat May 19 01:16:55 2012 From: tim at cerazone.net (Tim Cera) Date: Sat, 19 May 2012 01:16:55 -0400 Subject: [Numpy-discussion] python import question In-Reply-To: References: Message-ID: On Fri, May 18, 2012 at 5:49 PM, Chao YUE wrote: > Previously I have installed numpy 1.5.1. and then I used pip install > --upgrade numpy > to install numpy 1.6.1 > Why was the old 1.5.1 installation in /usr/lib/pymodules/python2.7? I have in the past used 'pip uninstall package' a couple of times in a row in order to remove old versions of packages, then run 'pip install package'. Old packages and pip seem to be a problem if you have used 'easy_install' to install the older packages. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Sat May 19 04:52:20 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sat, 19 May 2012 10:52:20 +0200 Subject: [Numpy-discussion] python import question In-Reply-To: References: Message-ID: I forgot whether I installed numpy 1.5.1 by esay_install or manually. But anyway, I had the same issue with you that I cannot use pip uninstall numpy to remove 1.5.1. chao 2012/5/19 Tim Cera > > > On Fri, May 18, 2012 at 5:49 PM, Chao YUE wrote: > >> Previously I have installed numpy 1.5.1. and then I used pip install >> --upgrade numpy >> to install numpy 1.6.1 >> > > Why was the old 1.5.1 installation in /usr/lib/pymodules/python2.7? > > I have in the past used 'pip uninstall package' a couple of times in a row > in order to remove old versions of packages, then run 'pip install > package'. Old packages and pip seem to be a problem if you have used > 'easy_install' to install the older packages. > > Kindest regards, > Tim > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sat May 19 08:21:05 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 19 May 2012 14:21:05 +0200 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: Travis Oliphant wrote: >Hey all, > >After reading all the discussion around masked arrays and getting input >from as many people as possible, it is clear that there is still >disagreement about what to do, but there have been some fruitful >discussions that ensued. > >This isn't really new as there was significant disagreement about what >to do when the masked array code was initially checked in to master. >So, in order to move forward, Mark and I are going to work together >with whomever else is willing to help with an effort that is in the >spirit of my third proposal but has a few adjustments. > >The idea will be fleshed out in more detail as it progresses, but the >basic concept is to create an (experimental) ndmasked object in NumPy >1.7 and leave the actual ndarray object unchanged. While the details >need to be worked out here, a goal is to have the C-API work with both >ndmasked arrays and arrayobjects (possibly by defining a base-class >C-level structure that both ndarrays inherit from). This might also >be a good way for Dag to experiment with his ideas as well but that is >not an explicit goal. > Yes, I'm sufficiently geared up about that that I'd like to try to do something; perhaps try to put the support in the C API for masks in a vtable (so that the functions can take happily take PyObject* rather than PyArrayObject*, and thus be used for other ways of associating masks with array data down the line, so that I could for instance make a class outside of numpy with a 'shared mask'). My available time is an extremely stochastic quantity at the moment though; so I really can't say how relevant that is to 1.7 yet. You'll see. (My discussion was really for the long term, as in, I'd like to do something over the coming year...) Dag >One way this could work, for example is to have PyArrayObject * be the >base-class array (essentially the same C-structure we have now with a >HASMASK flag). Then, the ndmasked object could inherit from >PyArrayObject * as well but add more members to the C-structure. I >think this is the easiest thing to do and requires the least amount of >code-change. It is also possible to define an abstract base-class >PyArrayObject * that both ndarray and ndmasked inherit from. That >way ndarray and ndmasked are siblings even though the ndarray would >essentially *be* the PyArrayObject * --- just with a different >type-hierarchy on the python side. > >This work will take some time and, therefore, I don't expect 1.7 to be >released prior to SciPy Austin with an end of June target date. The >timing will largely depend on what time is available from people >interested in resolving the situation. Mark and I will have some >availability for this work in June but not a great deal (about 2 >man-weeks total between us). If there are others who can step in and >help, it will help accelerate the process. > >Best regards, > >-Travis > > > >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From charlesr.harris at gmail.com Sat May 19 10:17:04 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 19 May 2012 08:17:04 -0600 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant wrote: > Hey all, > > After reading all the discussion around masked arrays and getting input > from as many people as possible, it is clear that there is still > disagreement about what to do, but there have been some fruitful > discussions that ensued. > > This isn't really new as there was significant disagreement about what to > do when the masked array code was initially checked in to master. So, in > order to move forward, Mark and I are going to work together with whomever > else is willing to help with an effort that is in the spirit of my third > proposal but has a few adjustments. > > The idea will be fleshed out in more detail as it progresses, but the > basic concept is to create an (experimental) ndmasked object in NumPy 1.7 > and leave the actual ndarray object unchanged. While the details need to > be worked out here, a goal is to have the C-API work with both ndmasked > arrays and arrayobjects (possibly by defining a base-class C-level > structure that both ndarrays inherit from). This might also be a good > way for Dag to experiment with his ideas as well but that is not an > explicit goal. > > One way this could work, for example is to have PyArrayObject * be the > base-class array (essentially the same C-structure we have now with a > HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * > as well but add more members to the C-structure. I think this is the > easiest thing to do and requires the least amount of code-change. It > is also possible to define an abstract base-class PyArrayObject * that both > ndarray and ndmasked inherit from. That way ndarray and ndmasked are > siblings even though the ndarray would essentially *be* the PyArrayObject * > --- just with a different type-hierarchy on the python side. > > This work will take some time and, therefore, I don't expect 1.7 to be > released prior to SciPy Austin with an end of June target date. The > timing will largely depend on what time is available from people interested > in resolving the situation. Mark and I will have some availability for > this work in June but not a great deal (about 2 man-weeks total between > us). If there are others who can step in and help, it will help > accelerate the process. > > This will be a difficult thing for others to help with since the concept is vague, the design decisions seem to be in your and Mark's hands, and you say you don't have much time. It looks to me like 1.7 will keep slipping and I don't think that is a good thing. Why not go for option 2, which will get 1.7 out there and push the new masked array work in to 1.8? Breaking the flow of development and release has consequences, few of them good. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat May 19 11:00:41 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 19 May 2012 16:00:41 +0100 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: On Sat, May 19, 2012 at 3:17 PM, Charles R Harris wrote: > > > On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant wrote: > >> Hey all, >> >> After reading all the discussion around masked arrays and getting input >> from as many people as possible, it is clear that there is still >> disagreement about what to do, but there have been some fruitful >> discussions that ensued. >> >> This isn't really new as there was significant disagreement about what to >> do when the masked array code was initially checked in to master. So, in >> order to move forward, Mark and I are going to work together with whomever >> else is willing to help with an effort that is in the spirit of my third >> proposal but has a few adjustments. >> >> The idea will be fleshed out in more detail as it progresses, but the >> basic concept is to create an (experimental) ndmasked object in NumPy 1.7 >> and leave the actual ndarray object unchanged. While the details need to >> be worked out here, a goal is to have the C-API work with both ndmasked >> arrays and arrayobjects (possibly by defining a base-class C-level >> structure that both ndarrays inherit from). This might also be a good >> way for Dag to experiment with his ideas as well but that is not an >> explicit goal. >> >> One way this could work, for example is to have PyArrayObject * be the >> base-class array (essentially the same C-structure we have now with a >> HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * >> as well but add more members to the C-structure. I think this is the >> easiest thing to do and requires the least amount of code-change. It >> is also possible to define an abstract base-class PyArrayObject * that both >> ndarray and ndmasked inherit from. That way ndarray and ndmasked are >> siblings even though the ndarray would essentially *be* the PyArrayObject * >> --- just with a different type-hierarchy on the python side. >> >> This work will take some time and, therefore, I don't expect 1.7 to be >> released prior to SciPy Austin with an end of June target date. The >> timing will largely depend on what time is available from people interested >> in resolving the situation. Mark and I will have some availability for >> this work in June but not a great deal (about 2 man-weeks total between >> us). If there are others who can step in and help, it will help >> accelerate the process. >> >> > This will be a difficult thing for others to help with since the concept > is vague, the design decisions seem to be in your and Mark's hands, and you > say you don't have much time. It looks to me like 1.7 will keep slipping > and I don't think that is a good thing. Why not go for option 2, which will > get 1.7 out there and push the new masked array work in to 1.8? Breaking > the flow of development and release has consequences, few of them good. > Agreed. 1.6.0 was released one year ago already, let's focus on polishing what's in there *now*. I have not followed closely what the decision was for a LTS release, but if 1.7 is supposed to be it, that's another argument about changing anything there for 1.7. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat May 19 11:21:32 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 19 May 2012 10:21:32 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: On Sat, May 19, 2012 at 10:00 AM, David Cournapeau wrote: > On Sat, May 19, 2012 at 3:17 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant wrote: >> >>> Hey all, >>> >>> After reading all the discussion around masked arrays and getting input >>> from as many people as possible, it is clear that there is still >>> disagreement about what to do, but there have been some fruitful >>> discussions that ensued. >>> >>> This isn't really new as there was significant disagreement about what >>> to do when the masked array code was initially checked in to master. So, >>> in order to move forward, Mark and I are going to work together with >>> whomever else is willing to help with an effort that is in the spirit of my >>> third proposal but has a few adjustments. >>> >>> The idea will be fleshed out in more detail as it progresses, but the >>> basic concept is to create an (experimental) ndmasked object in NumPy 1.7 >>> and leave the actual ndarray object unchanged. While the details need to >>> be worked out here, a goal is to have the C-API work with both ndmasked >>> arrays and arrayobjects (possibly by defining a base-class C-level >>> structure that both ndarrays inherit from). This might also be a good >>> way for Dag to experiment with his ideas as well but that is not an >>> explicit goal. >>> >>> One way this could work, for example is to have PyArrayObject * be the >>> base-class array (essentially the same C-structure we have now with a >>> HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * >>> as well but add more members to the C-structure. I think this is the >>> easiest thing to do and requires the least amount of code-change. It >>> is also possible to define an abstract base-class PyArrayObject * that both >>> ndarray and ndmasked inherit from. That way ndarray and ndmasked are >>> siblings even though the ndarray would essentially *be* the PyArrayObject * >>> --- just with a different type-hierarchy on the python side. >>> >>> This work will take some time and, therefore, I don't expect 1.7 to be >>> released prior to SciPy Austin with an end of June target date. The >>> timing will largely depend on what time is available from people interested >>> in resolving the situation. Mark and I will have some availability for >>> this work in June but not a great deal (about 2 man-weeks total between >>> us). If there are others who can step in and help, it will help >>> accelerate the process. >>> >>> >> This will be a difficult thing for others to help with since the concept >> is vague, the design decisions seem to be in your and Mark's hands, and you >> say you don't have much time. It looks to me like 1.7 will keep slipping >> and I don't think that is a good thing. Why not go for option 2, which will >> get 1.7 out there and push the new masked array work in to 1.8? Breaking >> the flow of development and release has consequences, few of them good. >> > > Agreed. 1.6.0 was released one year ago already, let's focus on polishing > what's in there *now*. I have not followed closely what the decision was > for a LTS release, but if 1.7 is supposed to be it, that's another argument > about changing anything there for 1.7. > The motivation behind splitting the mask out into a separate ndmasked is primarily so that pre-existing code will not silently function on NA-masked arrays and produce incorrect results. This centres around using PyArray_DATA to get at the data after manually checking flags, instead of calling PyArray_FromAny. Maybe a reasonable solution is to tweak the behavior of PyArray_DATA? It could work as follows: - If an ndarray has no mask, PyArray_DATA returns the data pointer as it does currently. - If the ndarray has an NA-mask, PyArray_DATA sets an exception and returns NULL - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which returns the array data under all circumstances. This way, code which currently uses the data pointer through PyArray_DATA will fail instead of silently working with the wrong interpretation of the data. What do people feel about this idea? -Mark > David > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat May 19 12:01:45 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 19 May 2012 17:01:45 +0100 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: On Sat, May 19, 2012 at 4:21 PM, Mark Wiebe wrote: > The motivation behind splitting the mask out into a separate ndmasked is > primarily so that pre-existing code will not silently function on NA-masked > arrays and produce incorrect results. This centres around using PyArray_DATA > to get at the data after manually checking flags, instead of calling > PyArray_FromAny. Maybe a reasonable solution is to tweak the behavior of > PyArray_DATA? It could work as follows: > > - If an ndarray has no mask, PyArray_DATA returns the data pointer as it > does currently. > - If the ndarray has an NA-mask, PyArray_DATA sets an exception and returns > NULL > - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which returns > the array data under all circumstances. > > This way, code which currently uses the data pointer through PyArray_DATA > will fail instead of silently working with the wrong interpretation of the > data. What do people feel about this idea? By "fail" you mean, specifically, segfault, right? PyArray_DATA currently cannot fail or return NULL, so I doubt any existing code checks for exceptions before dereferencing the pointer it returns. - N From charlesr.harris at gmail.com Sat May 19 12:02:23 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 19 May 2012 10:02:23 -0600 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: On Sat, May 19, 2012 at 9:21 AM, Mark Wiebe wrote: > On Sat, May 19, 2012 at 10:00 AM, David Cournapeau wrote: > >> On Sat, May 19, 2012 at 3:17 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant wrote: >>> >>>> Hey all, >>>> >>>> After reading all the discussion around masked arrays and getting input >>>> from as many people as possible, it is clear that there is still >>>> disagreement about what to do, but there have been some fruitful >>>> discussions that ensued. >>>> >>>> This isn't really new as there was significant disagreement about what >>>> to do when the masked array code was initially checked in to master. So, >>>> in order to move forward, Mark and I are going to work together with >>>> whomever else is willing to help with an effort that is in the spirit of my >>>> third proposal but has a few adjustments. >>>> >>>> The idea will be fleshed out in more detail as it progresses, but the >>>> basic concept is to create an (experimental) ndmasked object in NumPy 1.7 >>>> and leave the actual ndarray object unchanged. While the details need to >>>> be worked out here, a goal is to have the C-API work with both ndmasked >>>> arrays and arrayobjects (possibly by defining a base-class C-level >>>> structure that both ndarrays inherit from). This might also be a good >>>> way for Dag to experiment with his ideas as well but that is not an >>>> explicit goal. >>>> >>>> One way this could work, for example is to have PyArrayObject * be the >>>> base-class array (essentially the same C-structure we have now with a >>>> HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * >>>> as well but add more members to the C-structure. I think this is the >>>> easiest thing to do and requires the least amount of code-change. It >>>> is also possible to define an abstract base-class PyArrayObject * that both >>>> ndarray and ndmasked inherit from. That way ndarray and ndmasked are >>>> siblings even though the ndarray would essentially *be* the PyArrayObject * >>>> --- just with a different type-hierarchy on the python side. >>>> >>>> This work will take some time and, therefore, I don't expect 1.7 to be >>>> released prior to SciPy Austin with an end of June target date. The >>>> timing will largely depend on what time is available from people interested >>>> in resolving the situation. Mark and I will have some availability for >>>> this work in June but not a great deal (about 2 man-weeks total between >>>> us). If there are others who can step in and help, it will help >>>> accelerate the process. >>>> >>>> >>> This will be a difficult thing for others to help with since the concept >>> is vague, the design decisions seem to be in your and Mark's hands, and you >>> say you don't have much time. It looks to me like 1.7 will keep slipping >>> and I don't think that is a good thing. Why not go for option 2, which will >>> get 1.7 out there and push the new masked array work in to 1.8? Breaking >>> the flow of development and release has consequences, few of them good. >>> >> >> Agreed. 1.6.0 was released one year ago already, let's focus on polishing >> what's in there *now*. I have not followed closely what the decision was >> for a LTS release, but if 1.7 is supposed to be it, that's another argument >> about changing anything there for 1.7. >> > > The motivation behind splitting the mask out into a separate ndmasked is > primarily so that pre-existing code will not silently function on NA-masked > arrays and produce incorrect results. This centres around using > PyArray_DATA to get at the data after manually checking flags, instead of > calling PyArray_FromAny. Maybe a reasonable solution is to tweak the > behavior of PyArray_DATA? It could work as follows: > > - If an ndarray has no mask, PyArray_DATA returns the data pointer as it > does currently. > - If the ndarray has an NA-mask, PyArray_DATA sets an exception and > returns NULL > - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which returns > the array data under all circumstances. > > This way, code which currently uses the data pointer through PyArray_DATA > will fail instead of silently working with the wrong interpretation of the > data. What do people feel about this idea? > > Code working with the wrong interpretation of the data doesn't bother me much at this point in development. Long term it matters, but in the short term we can't expect code not explicitly written to work with masked arrays to do the right thing. I think we are looking at a period of several years before things settle out and get accepted. First, the implementation and its interface needs to get close to final form, and then the long slow process of adoption into things like matplotlib needs to take place. I'd quess three to five years for that process. That said, my main concern is to move forward and not spend the next year waiting. I see splitting the masked code out as rather like the python types having pointers to sequence/numerical/etc methods, i.e., ndarray then looks something like an abstract class. I don't have a problem with that and it does avoid base object bloat. As to having PyArray_DATA fail for masked arrays and provide new functions for unrestricted access, I'd be tempted to have PyArray_DATA continue to behave as it does and let the new functions return the error for masked arrays. Making third party applications fail for masked arrays is going make masked arrays very unpopular. Most likely no one would use them and third party applications would feel no pressure to support them. Another possibility might be to have a compile flag that determines whether of not PyArray_Data returns an error for masked arrays, something like we do now for deprecating old macros. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 19 12:45:03 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 19 May 2012 10:45:03 -0600 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: On Sat, May 19, 2012 at 10:02 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Sat, May 19, 2012 at 9:21 AM, Mark Wiebe wrote: > >> On Sat, May 19, 2012 at 10:00 AM, David Cournapeau wrote: >> >>> On Sat, May 19, 2012 at 3:17 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant wrote: >>>> >>>>> Hey all, >>>>> >>>>> After reading all the discussion around masked arrays and getting >>>>> input from as many people as possible, it is clear that there is still >>>>> disagreement about what to do, but there have been some fruitful >>>>> discussions that ensued. >>>>> >>>>> This isn't really new as there was significant disagreement about what >>>>> to do when the masked array code was initially checked in to master. So, >>>>> in order to move forward, Mark and I are going to work together with >>>>> whomever else is willing to help with an effort that is in the spirit of my >>>>> third proposal but has a few adjustments. >>>>> >>>>> The idea will be fleshed out in more detail as it progresses, but the >>>>> basic concept is to create an (experimental) ndmasked object in NumPy 1.7 >>>>> and leave the actual ndarray object unchanged. While the details need to >>>>> be worked out here, a goal is to have the C-API work with both ndmasked >>>>> arrays and arrayobjects (possibly by defining a base-class C-level >>>>> structure that both ndarrays inherit from). This might also be a good >>>>> way for Dag to experiment with his ideas as well but that is not an >>>>> explicit goal. >>>>> >>>>> One way this could work, for example is to have PyArrayObject * be the >>>>> base-class array (essentially the same C-structure we have now with a >>>>> HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * >>>>> as well but add more members to the C-structure. I think this is the >>>>> easiest thing to do and requires the least amount of code-change. It >>>>> is also possible to define an abstract base-class PyArrayObject * that both >>>>> ndarray and ndmasked inherit from. That way ndarray and ndmasked are >>>>> siblings even though the ndarray would essentially *be* the PyArrayObject * >>>>> --- just with a different type-hierarchy on the python side. >>>>> >>>>> This work will take some time and, therefore, I don't expect 1.7 to be >>>>> released prior to SciPy Austin with an end of June target date. The >>>>> timing will largely depend on what time is available from people interested >>>>> in resolving the situation. Mark and I will have some availability for >>>>> this work in June but not a great deal (about 2 man-weeks total between >>>>> us). If there are others who can step in and help, it will help >>>>> accelerate the process. >>>>> >>>>> >>>> This will be a difficult thing for others to help with since the >>>> concept is vague, the design decisions seem to be in your and Mark's hands, >>>> and you say you don't have much time. It looks to me like 1.7 will keep >>>> slipping and I don't think that is a good thing. Why not go for option 2, >>>> which will get 1.7 out there and push the new masked array work in to 1.8? >>>> Breaking the flow of development and release has consequences, few of them >>>> good. >>>> >>> >>> Agreed. 1.6.0 was released one year ago already, let's focus on >>> polishing what's in there *now*. I have not followed closely what the >>> decision was for a LTS release, but if 1.7 is supposed to be it, that's >>> another argument about changing anything there for 1.7. >>> >> >> The motivation behind splitting the mask out into a separate ndmasked is >> primarily so that pre-existing code will not silently function on NA-masked >> arrays and produce incorrect results. This centres around using >> PyArray_DATA to get at the data after manually checking flags, instead of >> calling PyArray_FromAny. Maybe a reasonable solution is to tweak the >> behavior of PyArray_DATA? It could work as follows: >> >> - If an ndarray has no mask, PyArray_DATA returns the data pointer as it >> does currently. >> - If the ndarray has an NA-mask, PyArray_DATA sets an exception and >> returns NULL >> - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which >> returns the array data under all circumstances. >> >> This way, code which currently uses the data pointer through PyArray_DATA >> will fail instead of silently working with the wrong interpretation of the >> data. What do people feel about this idea? >> >> > Code working with the wrong interpretation of the data doesn't bother me > much at this point in development. Long term it matters, but in the short > term we can't expect code not explicitly written to work with masked arrays > to do the right thing. I think we are looking at a period of several years > before things settle out and get accepted. First, the implementation and > its interface needs to get close to final form, and then the long slow > process of adoption into things like matplotlib needs to take place. I'd > quess three to five years for that process. > > That said, my main concern is to move forward and not spend the next year > waiting. I see splitting the masked code out as rather like the python > types having pointers to sequence/numerical/etc methods, i.e., ndarray then > looks something like an abstract class. I don't have a problem with that > and it does avoid base object bloat. As to having PyArray_DATA fail for > masked arrays and provide new functions for unrestricted access, I'd be > tempted to have PyArray_DATA continue to behave as it does and let the new > functions return the error for masked arrays. Making third party > applications fail for masked arrays is going make masked arrays very > unpopular. Most likely no one would use them and third party applications > would feel no pressure to support them. Another possibility might be to > have a compile flag that determines whether of not PyArray_Data returns an > error for masked arrays, something like we do now for deprecating old > macros. > > My own plan for the near term would be as follows: 1) Put in the experimental option and get the 1.7 release out. This gets us through the next couple of months and keeps things moving. 2) Look at what hooks/low level functions would let us reimplement np.ma. Because there are so many different mask uses out there, this would be a good way to discover what low level support is likely to provide a good basis for others to build on. 3) Revisit the idea of making all ndarrays masked by default, but do so with the experience and feedback from current mask users. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat May 19 14:23:16 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 19 May 2012 19:23:16 +0100 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: On Sat, May 19, 2012 at 5:45 PM, Charles R Harris wrote: > > > On Sat, May 19, 2012 at 10:02 AM, Charles R Harris > wrote: >> >> >> >> On Sat, May 19, 2012 at 9:21 AM, Mark Wiebe wrote: >>> >>> On Sat, May 19, 2012 at 10:00 AM, David Cournapeau >>> wrote: >>>> >>>> On Sat, May 19, 2012 at 3:17 PM, Charles R Harris >>>> wrote: >>>>> >>>>> On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant >>>>> wrote: >>>>>> >>>>>> Hey all, >>>>>> >>>>>> After reading all the discussion around masked arrays and getting >>>>>> input from as many people as possible, it is clear that there is still >>>>>> disagreement about what to do, but there have been some fruitful discussions >>>>>> that ensued. >>>>>> >>>>>> This isn't really new as there was significant disagreement about what >>>>>> to do when the masked array code was initially checked in to master. ? So, >>>>>> in order to move forward, Mark and I are going to work together with >>>>>> whomever else is willing to help with an effort that is in the spirit of my >>>>>> third proposal but has a few adjustments. >>>>>> >>>>>> The idea will be fleshed out in more detail as it progresses, but the >>>>>> basic concept is to create an (experimental) ndmasked object in NumPy 1.7 >>>>>> and leave the actual ndarray object unchanged. ? While the details need to >>>>>> be worked out here, ?a goal is to have the C-API work with both ndmasked >>>>>> arrays and arrayobjects (possibly by defining a base-class C-level structure >>>>>> that both ndarrays inherit from). ? ? This might also be a good way for Dag >>>>>> to experiment with his ideas as well but that is not an explicit goal. >>>>>> >>>>>> One way this could work, for example is to have PyArrayObject * be the >>>>>> base-class array (essentially the same C-structure we have now with a >>>>>> HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * >>>>>> as well but add more members to the C-structure. ? ? I think this is the >>>>>> easiest thing to do and requires the least amount of code-change. ? ? ?It is >>>>>> also possible to define an abstract base-class PyArrayObject * that both >>>>>> ndarray and ndmasked inherit from. ? ? That way ndarray and ndmasked are >>>>>> siblings even though the ndarray would essentially *be* the PyArrayObject * >>>>>> --- just with a different type-hierarchy on the python side. >>>>>> >>>>>> This work will take some time and, therefore, I don't expect 1.7 to be >>>>>> released prior to SciPy Austin with an end of June target date. ? The timing >>>>>> will largely depend on what time is available from people interested in >>>>>> resolving the situation. ? Mark and I will have some availability for this >>>>>> work in June but not a great deal (about 2 man-weeks total between us). >>>>>> ?If there are others who can step in and help, it will help accelerate the >>>>>> process. >>>>>> >>>>> >>>>> This will be a difficult thing for others to help with since the >>>>> concept is vague, the design decisions seem to be in your and Mark's hands, >>>>> and you say you don't have much time. It looks to me like 1.7 will keep >>>>> slipping and I don't think that is a good thing. Why not go for option 2, >>>>> which will get 1.7 out there and push the new masked array work in to 1.8? >>>>> Breaking the flow of development and release has consequences, few of them >>>>> good. >>>> >>>> >>>> Agreed. 1.6.0 was released one year ago already, let's focus on >>>> polishing what's in there *now*. I have not followed closely what the >>>> decision was for a LTS release, but if 1.7 is supposed to be it, that's >>>> another argument about changing anything there for 1.7. >>> >>> >>> The motivation behind splitting the mask out into a separate ndmasked is >>> primarily so that pre-existing code will not silently function on NA-masked >>> arrays and produce incorrect results. This centres around using PyArray_DATA >>> to get at the data after manually checking flags, instead of calling >>> PyArray_FromAny. Maybe a reasonable solution is to tweak the behavior of >>> PyArray_DATA? It could work as follows: >>> >>> - If an ndarray has no mask, PyArray_DATA returns the data pointer as it >>> does currently. >>> - If the ndarray has an NA-mask, PyArray_DATA sets an exception and >>> returns NULL >>> - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which >>> returns the array data under all circumstances. >>> >>> This way, code which currently uses the data pointer through PyArray_DATA >>> will fail instead of silently working with the wrong interpretation of the >>> data. What do people feel about this idea? >>> >> >> Code working with the wrong interpretation of the data doesn't bother me >> much at this point in development. Long term it matters, but in the short >> term we can't expect code not explicitly written to work with masked arrays >> to do the right thing. I think we are looking at a period of several years >> before things settle out and get accepted. First, the implementation and its >> interface needs to get close to final form, and then the long slow process >> of adoption into things like matplotlib needs to take place. I'd quess three >> to five years for that process. >> >> That said, my main concern is to move forward and not spend the next year >> waiting. I see splitting the masked code out as rather like the python types >> having pointers to sequence/numerical/etc methods, i.e., ndarray then looks >> something like an abstract class. I don't have a problem with that and it >> does avoid base object bloat. As to having PyArray_DATA fail for masked >> arrays and provide new functions for unrestricted access, I'd be tempted to >> have PyArray_DATA continue to behave as it does and let the new functions >> return the error for masked arrays. Making third party applications fail for >> masked arrays is going make masked arrays very unpopular. Most likely no one >> would use them and third party applications would feel no pressure to >> support them. Another possibility might be to have a compile flag that >> determines whether of not PyArray_Data returns an error for masked arrays, >> something like we do now for deprecating old macros. >> > > My own plan for the near term would be as follows: > > 1) Put in the experimental option and get the 1.7 release out. This gets us > through the next couple of months and keeps things moving. +1 on not blocking the release while we invent+implement yet another experimental API. > 2) Look at what hooks/low level functions would let us reimplement np.ma. > Because there are so many different mask uses out there, this would be a > good way to discover what low level support is likely to provide a good > basis for others to build on. > > 3) Revisit the idea of making all ndarrays masked by default, but do so with > the experience and feedback from current mask users. I like this plan. -- Nathaniel From njs at pobox.com Sat May 19 16:54:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 19 May 2012 21:54:38 +0100 Subject: [Numpy-discussion] Separating out the maskna code Message-ID: Hi all, Since Mark's original missingdata branch made so many changes, I figured it would be a useful exercise to figure out what code in master is actually related to masked arrays, and which isn't. The easiest way seemed to be to delete the new fields, then keep removing any code that depended on them until everything built again, which gives this: https://github.com/njsmith/numpy/commits/separate-maskna Possible uses: - Use the diff as a checklist for going through to change the API - Use the diff as a checklist for adding an experimental-API-only flag - Merge into master, then use as a reference to cherrypick the pieces that we want to save (there is some questionable stuff in here -- e.g. copy/paste code, hand-wavy half-implementation of "multi-NA" support, and PyArray_ReduceWrapper, see below) - Merge into master and then immediately 'git revert' the changes on a branch, which would effectively 'move it aside' so we can release 1.7 while Mark and Travis to continue hacking on it at their own pace This is a huge patch, but I was pretty careful not to cause any accidental non-maskna-related regressions. The whole PyArray_Diagonal thread actually happened because I noticed that test_maskna had the only tests for np.diagonal, so I wanted to write proper ones that would be independent of the maskna code. Also I ran the following tests: - numpy.test("full") - scipy v0.10.1, test("full") - matplotlib current master - pandas v0.7.3 and everything looks good. The other complicated thing to handle was the new PyArray_ReduceWrapper function that was added to the public multiarray API. Conceptually, this function has only a glancing relationship to masked arrays per se. But, it has its own problems, and I don't think it should be exposed in 1.7. Partly because its signature necessarily changes depending on whether maskna support exists. Partly because it's just kind of ugly (it has 15 arguments, after I removed some[1]). But mostly because it gives us two independent generic implementations of functions that do array->scalar operations, which seems like something we absolutely don't want to commit to supporting indefinitely. And the "generalized ufunc" alternative seems a lot more promising. So, that branch also has a followup patch that does the necessary hacking to get it out of the public API. Unfortunately, this patch is dependent on the previous one -- I'm not sure how to untangle PyArray_ReduceWrapper while keeping the maskna support in, which makes the "global experimental flag" idea for 1.7 hard to implement (assuming that others agree about PyArray_ReduceWrapper being unready for public exposure). At this point, it might easiest to just merge this branch to master, immediately revert it on a branch for Mark and Travis to work on, and then release 1.7. Ralf, IIUC merging this and my other outstanding PRs would leave the datetime issues on python3/win32 as the only outstanding blocker? - N [1] http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html , number 11 ------------ Commit messages follow for reference/discussion The main change is commit 4c16813c23b20: Remove maskna API from ndarray, and all (and only) the code supporting it The original masked-NA-NEP branch contained a large number of changes in addition to the core NA support. For example: - ufunc.__call__ support for where= argument - nditer support for arbitrary masks (in support of where=) - ufunc.reduce support for simultaneous reduction over multiple axes - a new "array assignment API" - ndarray.diagonal() returning a view in all cases - bug-fixes in __array_priority__ handling - datetime test changes etc. There's no consensus yet on what should be done with the maskna-related part of this branch, but the rest is generally useful and uncontroversial, so the goal of this branch is to identify exactly which code changes are involved in maskna support. The basic strategy used to create this patch was: - Remove the new masking-related fields from ndarray, so no arrays are masked - Go through and remove all the code that this makes dead/inaccessible/irrelevant, in a largely mechanical fashion. So for example, if I saw 'if (PyArray_HASMASK(a)) { ... }' then that whole block was obviously just dead code if no arrays have masks, and I removed it. Likewise for function arguments like skipna that are useless if there aren't any NAs to skip. This changed the signature of a number of functions that were newly exposed in the numpy public API. I've removed all such functions from the public API, since releasing them with the NA-less signature in 1.7 would create pointless compatibility hassles later if and when we add back the NA-related functionality. Most such functions are removed by this commit; the exception is PyArray_ReduceWrapper, which requires more extensive surgery, and will be handled in followup commits. I also removed the new ndarray.setasflat method. Reason: a comment noted that the only reason this was added was to allow easier testing of one branch of PyArray_CopyAsFlat. That branch is now the main branch, so that isn't an issue. Nonetheless this function is arguably useful, so perhaps it should have remained, but I judged that since numpy's API is already hairier than we would like, it's not a good idea to add extra hair "just in case". (Also AFAICT the test for this method in test_maskna was actually incorrect, as noted here: https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py so I'm not confident that it ever worked in master, though I haven't had a chance to follow-up on this.) I also removed numpy.count_reduce_items, since without skipna it became trivial. I believe that these are the only exceptions to the "remove dead code" strategy. The ReduceWrapper untangling is a001fb29c9: Remove PyArray_ReduceWrapper from public API There are two reasons to want to keep PyArray_ReduceWrapper out of the public multiarray API: - Its signature is likely to change if/when masked arrays are added - It is essentially a wrapper for array->scalar transformations (*not* just reductions as its name implies -- the whole reason it is in multiarray.so in the first place is to support count_nonzero, which is not actually a reduction!). It provides some nice conveniences (like making it easy to apply such functions to multiple axes simultaneously), but, we already have a general mechanism for writing array->scalar transformations -- generalized ufuncs. We do not want to have two independent, redundant implementations of this functionality, one in multiarray and one in umath! So in the long run we should add these nice features to the generalized ufunc machinery. And in the short run, we shouldn't add it to the public API and commit ourselves to supporting it. However, simply removing it from numpy_api.py is not easy, because this code was used in both multiarray and umath. This commit: - Moves ReduceWrapper and supporting code to umath/, and makes appropriate changes (e.g. renaming it to PyUFunc_ReduceWrapper and cleaning up the header files). - Reverts numpy.count_nonzero to its previous implementation, so that it loses the new axis= and keepdims= arguments. This is unfortunate, but this change isn't so urgent that it's worth tying our APIs in knots forever. (Perhaps in the future it can become a generalized ufunc.) From tim at cerazone.net Sat May 19 18:04:43 2012 From: tim at cerazone.net (Tim Cera) Date: Sat, 19 May 2012 18:04:43 -0400 Subject: [Numpy-discussion] Internationalization of numpy/scipy docstrings... In-Reply-To: References: Message-ID: I have thought for a long time that it would be nice to have numpy/scipy docs in multiple languages. I didn't have any idea how to do it until I saw http://sphinx.pocoo.org/intl.html. The gettext builder which is a requirement to make this happen is relatively new to sphinx. Outline of above applied to numpy/scipy... 1. pydocweb would use the new gettext builder to convert *.rst to *.pot files. 2. Translators would use pootle to edit the *.pot files to *.po files pydocweb or pootle would use mgsfmt to create *.mo files 3. From here can choose either: a. Have pydocweb use sphinx-build to create new, translated *.rst files from the *.mo files. (my favorite since we would have *.rst files) b. OR use gettext in Python to translate docstring on-the-fly from the *.mo files. A user would then install a language kit, maybe something like scikits and access the translated docstring with a new 'np.info'. As near as I can figure, Python 'help' command can't be replaced by something else, so 'help' would always display the English docstring. I have pydocweb and pootle setup locally and working. Ran into a problem though with sphinx-build creating the initial *.pot files. It seems to be a problem with numpydoc. It fails on 'function' and 'auto*' directives. I tried to look at numpydoc and it is a bit of very intense coding and I frankly have not been able to find my way around. I am willing to put in some work for this to happen. My block right now is getting the initial *.pot files. Any interest? You can see the problem directly by changing into the numpy/doc directory and use the following command: sphinx-build -b gettext -P source/ gettext/ Once sphinx-build is working, then the target build directory (which I called 'gettext' above) would be in a location accessible to pootle. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat May 19 20:16:21 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 20 May 2012 01:16:21 +0100 Subject: [Numpy-discussion] Internationalization of numpy/scipy docstrings... In-Reply-To: References: Message-ID: On May 19, 2012 11:04 PM, "Tim Cera" wrote: > A user would then install a language kit, maybe something like scikits and access the translated docstring with a new 'np.info'. As near as I can figure, Python 'help' command can't be replaced by something else, so 'help' would always display the English docstring. help() just returns the __doc__ attribute, but a large number of numpy's __doc__ attributes are set up by code at import time, so in principle even these could be run through gettext pretty easily. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Sat May 19 22:35:00 2012 From: tim at cerazone.net (Tim Cera) Date: Sat, 19 May 2012 22:35:00 -0400 Subject: [Numpy-discussion] Internationalization of numpy/scipy docstrings... In-Reply-To: References: Message-ID: On Sat, May 19, 2012 at 8:16 PM, Nathaniel Smith wrote: > help() just returns the __doc__ attribute, but a large number of numpy's > __doc__ attributes are set up by code at import time, so in principle even > these could be run through gettext pretty easily. > I didn't know that. I suggested modifying np.info since I suspect that a new np.info would be easier since changes to support i18n would be contained to one command. Of course if there is something easier/better, let's go with that. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun May 20 00:37:07 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 19 May 2012 23:37:07 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: On May 19, 2012, at 9:17 AM, Charles R Harris wrote: > > > On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant wrote: > Hey all, > > After reading all the discussion around masked arrays and getting input from as many people as possible, it is clear that there is still disagreement about what to do, but there have been some fruitful discussions that ensued. > > This isn't really new as there was significant disagreement about what to do when the masked array code was initially checked in to master. So, in order to move forward, Mark and I are going to work together with whomever else is willing to help with an effort that is in the spirit of my third proposal but has a few adjustments. > > The idea will be fleshed out in more detail as it progresses, but the basic concept is to create an (experimental) ndmasked object in NumPy 1.7 and leave the actual ndarray object unchanged. While the details need to be worked out here, a goal is to have the C-API work with both ndmasked arrays and arrayobjects (possibly by defining a base-class C-level structure that both ndarrays inherit from). This might also be a good way for Dag to experiment with his ideas as well but that is not an explicit goal. > > One way this could work, for example is to have PyArrayObject * be the base-class array (essentially the same C-structure we have now with a HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * as well but add more members to the C-structure. I think this is the easiest thing to do and requires the least amount of code-change. It is also possible to define an abstract base-class PyArrayObject * that both ndarray and ndmasked inherit from. That way ndarray and ndmasked are siblings even though the ndarray would essentially *be* the PyArrayObject * --- just with a different type-hierarchy on the python side. > > This work will take some time and, therefore, I don't expect 1.7 to be released prior to SciPy Austin with an end of June target date. The timing will largely depend on what time is available from people interested in resolving the situation. Mark and I will have some availability for this work in June but not a great deal (about 2 man-weeks total between us). If there are others who can step in and help, it will help accelerate the process. > > > This will be a difficult thing for others to help with since the concept is vague, the design decisions seem to be in your and Mark's hands, and you say you don't have much time. It looks to me like 1.7 will keep slipping and I don't think that is a good thing. Why not go for option 2, which will get 1.7 out there and push the new masked array work in to 1.8? Breaking the flow of development and release has consequences, few of them good. > I don't see how option 2 gets 1.7 out the door any easier as it does not address the actual problems with the changes to the ndarray object that are currently in master and it introduces a new experimental flag concept in a hurried way just to get a release out. This helps the Python side of things, but does nothing to address the C-side of things. Fundamentally, I don't see how we can put 1.7 out with the masked array fields on the ndarray object itself. This is clear from everything I've seen. -Travis > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun May 20 00:40:30 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 19 May 2012 23:40:30 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: <99B1D0E2-4A03-49DF-AB31-7912649EAEDA@continuum.io> On May 19, 2012, at 10:00 AM, David Cournapeau wrote: > > > On Sat, May 19, 2012 at 3:17 PM, Charles R Harris wrote: > > > On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant wrote: > Hey all, > > After reading all the discussion around masked arrays and getting input from as many people as possible, it is clear that there is still disagreement about what to do, but there have been some fruitful discussions that ensued. > > This isn't really new as there was significant disagreement about what to do when the masked array code was initially checked in to master. So, in order to move forward, Mark and I are going to work together with whomever else is willing to help with an effort that is in the spirit of my third proposal but has a few adjustments. > > The idea will be fleshed out in more detail as it progresses, but the basic concept is to create an (experimental) ndmasked object in NumPy 1.7 and leave the actual ndarray object unchanged. While the details need to be worked out here, a goal is to have the C-API work with both ndmasked arrays and arrayobjects (possibly by defining a base-class C-level structure that both ndarrays inherit from). This might also be a good way for Dag to experiment with his ideas as well but that is not an explicit goal. > > One way this could work, for example is to have PyArrayObject * be the base-class array (essentially the same C-structure we have now with a HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * as well but add more members to the C-structure. I think this is the easiest thing to do and requires the least amount of code-change. It is also possible to define an abstract base-class PyArrayObject * that both ndarray and ndmasked inherit from. That way ndarray and ndmasked are siblings even though the ndarray would essentially *be* the PyArrayObject * --- just with a different type-hierarchy on the python side. > > This work will take some time and, therefore, I don't expect 1.7 to be released prior to SciPy Austin with an end of June target date. The timing will largely depend on what time is available from people interested in resolving the situation. Mark and I will have some availability for this work in June but not a great deal (about 2 man-weeks total between us). If there are others who can step in and help, it will help accelerate the process. > > > This will be a difficult thing for others to help with since the concept is vague, the design decisions seem to be in your and Mark's hands, and you say you don't have much time. It looks to me like 1.7 will keep slipping and I don't think that is a good thing. Why not go for option 2, which will get 1.7 out there and push the new masked array work in to 1.8? Breaking the flow of development and release has consequences, few of them good. > > Agreed. 1.6.0 was released one year ago already, let's focus on polishing what's in there *now*. I have not followed closely what the decision was for a LTS release, but if 1.7 is supposed to be it, that's another argument about changing anything there for 1.7. This won't work, because what is in there *now* really shouldn't be there. The only way this would work is if we remove the masked array from the ndarray object. I think I saw that Nathaniel has a patch for this, but I have not had time to review this patch. There has been no agreement on an LTS release. The best candidate for that is 1.6.2, but I don't think it makes sense to talk about an LTS release until the masked array questions are resolved from what is there in master right now. -Travis > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun May 20 00:48:20 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 19 May 2012 23:48:20 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: <9D1F36AF-E082-4085-B5DB-B4FCC0DC2A11@continuum.io> > > My own plan for the near term would be as follows: > > 1) Put in the experimental option and get the 1.7 release out. This gets us through the next couple of months and keeps things moving. > The "experimental" option does not solve the problem which is that the ndarray object now has masked fields which changes the fundamental nature of an ndarray for a lot of downstream users that really have no idea what has just happened. I don't see how this has been addressed by any proposal except for the one I have suggested which allows a masked array object and a regular ndarray to co-exist for a time. I doubt that the proposal actually helps get 1.7 out any faster either as there are multiple experimental APIs that would have to be created to pull it off on both the C and Python level. From travis at continuum.io Sun May 20 00:50:03 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 19 May 2012 23:50:03 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: <4B8BAC54-1B98-43FB-ABDD-48E059351643@continuum.io> >> My own plan for the near term would be as follows: >> >> 1) Put in the experimental option and get the 1.7 release out. This gets us >> through the next couple of months and keeps things moving. > > +1 on not blocking the release while we invent+implement yet another > experimental API. Nobody has suggested inventing or implementing yet another anything. All I want to do is separate out the ndmasked array as a separate object and leave the ndarray object alone. > >> 2) Look at what hooks/low level functions would let us reimplement np.ma. >> Because there are so many different mask uses out there, this would be a >> good way to discover what low level support is likely to provide a good >> basis for others to build on. >> >> 3) Revisit the idea of making all ndarrays masked by default, but do so with >> the experience and feedback from current mask users. > > I like this plan. > I think points 2 and 3 are fine as long as the 1.7 release does not have masked array notions attached to all ndarray objects. -Travis > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Sun May 20 01:02:25 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 20 May 2012 00:02:25 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> Message-ID: <022089D6-9B56-4CD5-BCAB-3A0F107D2177@continuum.io> On May 19, 2012, at 10:21 AM, Mark Wiebe wrote: > On Sat, May 19, 2012 at 10:00 AM, David Cournapeau wrote: > On Sat, May 19, 2012 at 3:17 PM, Charles R Harris wrote: > On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant wrote: > Hey all, > > After reading all the discussion around masked arrays and getting input from as many people as possible, it is clear that there is still disagreement about what to do, but there have been some fruitful discussions that ensued. > > This isn't really new as there was significant disagreement about what to do when the masked array code was initially checked in to master. So, in order to move forward, Mark and I are going to work together with whomever else is willing to help with an effort that is in the spirit of my third proposal but has a few adjustments. > > The idea will be fleshed out in more detail as it progresses, but the basic concept is to create an (experimental) ndmasked object in NumPy 1.7 and leave the actual ndarray object unchanged. While the details need to be worked out here, a goal is to have the C-API work with both ndmasked arrays and arrayobjects (possibly by defining a base-class C-level structure that both ndarrays inherit from). This might also be a good way for Dag to experiment with his ideas as well but that is not an explicit goal. > > One way this could work, for example is to have PyArrayObject * be the base-class array (essentially the same C-structure we have now with a HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * as well but add more members to the C-structure. I think this is the easiest thing to do and requires the least amount of code-change. It is also possible to define an abstract base-class PyArrayObject * that both ndarray and ndmasked inherit from. That way ndarray and ndmasked are siblings even though the ndarray would essentially *be* the PyArrayObject * --- just with a different type-hierarchy on the python side. > > This work will take some time and, therefore, I don't expect 1.7 to be released prior to SciPy Austin with an end of June target date. The timing will largely depend on what time is available from people interested in resolving the situation. Mark and I will have some availability for this work in June but not a great deal (about 2 man-weeks total between us). If there are others who can step in and help, it will help accelerate the process. > > > This will be a difficult thing for others to help with since the concept is vague, the design decisions seem to be in your and Mark's hands, and you say you don't have much time. It looks to me like 1.7 will keep slipping and I don't think that is a good thing. Why not go for option 2, which will get 1.7 out there and push the new masked array work in to 1.8? Breaking the flow of development and release has consequences, few of them good. > > Agreed. 1.6.0 was released one year ago already, let's focus on polishing what's in there *now*. I have not followed closely what the decision was for a LTS release, but if 1.7 is supposed to be it, that's another argument about changing anything there for 1.7. > > The motivation behind splitting the mask out into a separate ndmasked is primarily so that pre-existing code will not silently function on NA-masked arrays and produce incorrect results. This centres around using PyArray_DATA to get at the data after manually checking flags, instead of calling PyArray_FromAny. Maybe a reasonable solution is to tweak the behavior of PyArray_DATA? It could work as follows: > > - If an ndarray has no mask, PyArray_DATA returns the data pointer as it does currently. > - If the ndarray has an NA-mask, PyArray_DATA sets an exception and returns NULL > - Create a new accessor, PyArray_DATAPTR or PyArray_RAWDATA, which returns the array data under all circumstances. > > This way, code which currently uses the data pointer through PyArray_DATA will fail instead of silently working with the wrong interpretation of the data. What do people feel about this idea? The problem with this is that PyArray_DATA calls typically don't do error checking as the API could not fail before. I could see introducing an API that did fail and then encouraging use of this API. Ultimately, the motivation to split the mask out is because the idea of *all* arrays being masked arrays at their core is a very new idea for NumPy and one that has downstream consequences and needs to be phased in more slowly. I'd rather not be stuck supporting the changes to PyArrayObject that it implies when it is not clear how masked arrays should really be handled or if it's appropriate that *all* NumPy arrays should be secretly masked arrays. Another thing I've been wondering... I presume that any accessors to the masked fields in the current NumPy code base are going through function calls. Does this mean that we could in 1.8 re-purpose those fields for some other feature and still have code compiled against 1.7 work with 1.8? If those calls are inlined in an extension module that uses the NumPy C-API, doesn't this mean that it's effectively the same (from an ABI perspective) of a macro access? If so, then I don't see how inlined function-calls are any better from an ABI perspective than a macro access. The only benefit seems to be the tendency to have fewer pre-processor inspired bugs. But, ultimately, it seems a point of style rather than function. -Travis > > -Mark > > David > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun May 20 01:08:27 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 20 May 2012 00:08:27 -0500 Subject: [Numpy-discussion] Separating out the maskna code In-Reply-To: References: Message-ID: <5432E28E-A6B0-4B3B-8932-14B2114ABDD1@continuum.io> Wow, Nathaniel. This looks like a nice piece of tedious work. I have not reviewed it in detail, but in general I would be very supportive of your plan to commit this to master, make a 1.7 release (without the ReduceWrapper) function and then work on the masked array / ndarray separation plan for 1.8 Of course, first I would want to hear from Mark, to hear his comments about what was removed. -Travis On May 19, 2012, at 3:54 PM, Nathaniel Smith wrote: > Hi all, > > Since Mark's original missingdata branch made so many changes, I > figured it would be a useful exercise to figure out what code in > master is actually related to masked arrays, and which isn't. The > easiest way seemed to be to delete the new fields, then keep removing > any code that depended on them until everything built again, which > gives this: > https://github.com/njsmith/numpy/commits/separate-maskna > > Possible uses: > - Use the diff as a checklist for going through to change the API > - Use the diff as a checklist for adding an experimental-API-only flag > - Merge into master, then use as a reference to cherrypick the > pieces that we want to save (there is some questionable stuff in here > -- e.g. copy/paste code, hand-wavy half-implementation of "multi-NA" > support, and PyArray_ReduceWrapper, see below) > - Merge into master and then immediately 'git revert' the changes on > a branch, which would effectively 'move it aside' so we can release > 1.7 while Mark and Travis to continue hacking on it at their own pace > > This is a huge patch, but I was pretty careful not to cause any > accidental non-maskna-related regressions. The whole PyArray_Diagonal > thread actually happened because I noticed that test_maskna had the > only tests for np.diagonal, so I wanted to write proper ones that > would be independent of the maskna code. Also I ran the following > tests: > - numpy.test("full") > - scipy v0.10.1, test("full") > - matplotlib current master > - pandas v0.7.3 > and everything looks good. > > The other complicated thing to handle was the new > PyArray_ReduceWrapper function that was added to the public multiarray > API. Conceptually, this function has only a glancing relationship to > masked arrays per se. But, it has its own problems, and I don't think > it should be exposed in 1.7. Partly because its signature necessarily > changes depending on whether maskna support exists. Partly because > it's just kind of ugly (it has 15 arguments, after I removed some[1]). > But mostly because it gives us two independent generic implementations > of functions that do array->scalar operations, which seems like > something we absolutely don't want to commit to supporting > indefinitely. And the "generalized ufunc" alternative seems a lot more > promising. So, that branch also has a followup patch that does the > necessary hacking to get it out of the public API. > > Unfortunately, this patch is dependent on the previous one -- I'm not > sure how to untangle PyArray_ReduceWrapper while keeping the maskna > support in, which makes the "global experimental flag" idea for 1.7 > hard to implement (assuming that others agree about > PyArray_ReduceWrapper being unready for public exposure). > > At this point, it might easiest to just merge this branch to master, > immediately revert it on a branch for Mark and Travis to work on, and > then release 1.7. > > Ralf, IIUC merging this and my other outstanding PRs would leave the > datetime issues on python3/win32 as the only outstanding blocker? > > - N > > [1] http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html > , number 11 > > ------------ Commit messages follow for reference/discussion > > The main change is commit 4c16813c23b20: > Remove maskna API from ndarray, and all (and only) the code supporting it > > The original masked-NA-NEP branch contained a large number of changes > in addition to the core NA support. For example: > - ufunc.__call__ support for where= argument > - nditer support for arbitrary masks (in support of where=) > - ufunc.reduce support for simultaneous reduction over multiple axes > - a new "array assignment API" > - ndarray.diagonal() returning a view in all cases > - bug-fixes in __array_priority__ handling > - datetime test changes > etc. There's no consensus yet on what should be done with the > maskna-related part of this branch, but the rest is generally useful > and uncontroversial, so the goal of this branch is to identify exactly > which code changes are involved in maskna support. > > The basic strategy used to create this patch was: > - Remove the new masking-related fields from ndarray, so no arrays > are masked > - Go through and remove all the code that this makes > dead/inaccessible/irrelevant, in a largely mechanical fashion. So > for example, if I saw 'if (PyArray_HASMASK(a)) { ... }' then that > whole block was obviously just dead code if no arrays have masks, > and I removed it. Likewise for function arguments like skipna that > are useless if there aren't any NAs to skip. > > This changed the signature of a number of functions that were newly > exposed in the numpy public API. I've removed all such functions from > the public API, since releasing them with the NA-less signature in 1.7 > would create pointless compatibility hassles later if and when we add > back the NA-related functionality. Most such functions are removed by > this commit; the exception is PyArray_ReduceWrapper, which requires > more extensive surgery, and will be handled in followup commits. > > I also removed the new ndarray.setasflat method. Reason: a comment > noted that the only reason this was added was to allow easier testing > of one branch of PyArray_CopyAsFlat. That branch is now the main > branch, so that isn't an issue. Nonetheless this function is arguably > useful, so perhaps it should have remained, but I judged that since > numpy's API is already hairier than we would like, it's not a good > idea to add extra hair "just in case". (Also AFAICT the test for this > method in test_maskna was actually incorrect, as noted here: > https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py > so I'm not confident that it ever worked in master, though I haven't > had a chance to follow-up on this.) > > I also removed numpy.count_reduce_items, since without skipna it > became trivial. > > I believe that these are the only exceptions to the "remove dead code" > strategy. > > The ReduceWrapper untangling is a001fb29c9: > Remove PyArray_ReduceWrapper from public API > > There are two reasons to want to keep PyArray_ReduceWrapper out of the > public multiarray API: > - Its signature is likely to change if/when masked arrays are added > - It is essentially a wrapper for array->scalar transformations > (*not* just reductions as its name implies -- the whole reason it > is in multiarray.so in the first place is to support count_nonzero, > which is not actually a reduction!). It provides some nice > conveniences (like making it easy to apply such functions to > multiple axes simultaneously), but, we already have a general > mechanism for writing array->scalar transformations -- generalized > ufuncs. We do not want to have two independent, redundant > implementations of this functionality, one in multiarray and one in > umath! So in the long run we should add these nice features to the > generalized ufunc machinery. And in the short run, we shouldn't add > it to the public API and commit ourselves to supporting it. > > However, simply removing it from numpy_api.py is not easy, because > this code was used in both multiarray and umath. This commit: > - Moves ReduceWrapper and supporting code to umath/, and makes > appropriate changes (e.g. renaming it to PyUFunc_ReduceWrapper and > cleaning up the header files). > - Reverts numpy.count_nonzero to its previous implementation, so that > it loses the new axis= and keepdims= arguments. This is > unfortunate, but this change isn't so urgent that it's worth tying > our APIs in knots forever. (Perhaps in the future it can become a > generalized ufunc.) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sun May 20 01:15:32 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 19 May 2012 23:15:32 -0600 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: <9D1F36AF-E082-4085-B5DB-B4FCC0DC2A11@continuum.io> References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> <9D1F36AF-E082-4085-B5DB-B4FCC0DC2A11@continuum.io> Message-ID: On Sat, May 19, 2012 at 10:48 PM, Travis Oliphant wrote: > > > > My own plan for the near term would be as follows: > > > > 1) Put in the experimental option and get the 1.7 release out. This gets > us through the next couple of months and keeps things moving. > > > > The "experimental" option does not solve the problem which is that the > ndarray object now has masked fields which changes the fundamental nature > of an ndarray for a lot of downstream users that really have no idea what > has just happened. I don't see how this has been addressed by any > proposal except for the one I have suggested which allows a masked array > object and a regular ndarray to co-exist for a time. I doubt that the > proposal actually helps get 1.7 out any faster either as there are multiple > experimental APIs that would have to be created to pull it off on both the > C and Python level. > So, remove them in 1.8 and try something else. With experimental (say in site.cfg), the base array could even be different. I don't see the problem here. Think big. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun May 20 01:38:31 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 20 May 2012 00:38:31 -0500 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> <9D1F36AF-E082-4085-B5DB-B4FCC0DC2A11@continuum.io> Message-ID: On May 20, 2012, at 12:15 AM, Charles R Harris wrote: > > > On Sat, May 19, 2012 at 10:48 PM, Travis Oliphant wrote: > > > > My own plan for the near term would be as follows: > > > > 1) Put in the experimental option and get the 1.7 release out. This gets us through the next couple of months and keeps things moving. > > > > The "experimental" option does not solve the problem which is that the ndarray object now has masked fields which changes the fundamental nature of an ndarray for a lot of downstream users that really have no idea what has just happened. I don't see how this has been addressed by any proposal except for the one I have suggested which allows a masked array object and a regular ndarray to co-exist for a time. I doubt that the proposal actually helps get 1.7 out any faster either as there are multiple experimental APIs that would have to be created to pull it off on both the C and Python level. > > So, remove them in 1.8 and try something else. With experimental (say in site.cfg), the base array could even be different. I don't see the problem here. Think big. I don't think I understand your mental model of this. Are you saying add an experimental flag at the C-level (essentially a #define that eliminates any code involving masked arrays unless the define is made at compile time?) It seems like just applying Nathaniel's patch would be a better approach. -Travis > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Sun May 20 03:21:39 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sun, 20 May 2012 09:21:39 +0200 Subject: [Numpy-discussion] why not zerodivision error? Message-ID: Dear all, could anybody give one sentence about this? why in the loop I didn't get zerodivision error by when I explicitly do this, I get a zerodivision error? thanks. In [7]: for i in np.arange(-10,10): print 1./i ...: -0.1 -0.111111111111 -0.125 -0.142857142857 -0.166666666667 -0.2 -0.25 -0.333333333333 -0.5 -1.0 inf 1.0 0.5 0.333333333333 0.25 0.2 0.166666666667 0.142857142857 0.125 0.111111111111 In [8]: 1/0. --------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) /mnt/f/data/DROUGTH/ in () ----> 1 1/0. ZeroDivisionError: float division by zero In [9]: 1./0. --------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) /mnt/f/data/DROUGTH/ in () ----> 1 1./0. ZeroDivisionError: float division by zero In [10]: 1./0 --------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) /mnt/f/data/DROUGTH/ in () ----> 1 1./0 ZeroDivisionError: float division by zero Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Sun May 20 03:47:42 2012 From: e.antero.tammi at gmail.com (eat) Date: Sun, 20 May 2012 10:47:42 +0300 Subject: [Numpy-discussion] why not zerodivision error? In-Reply-To: References: Message-ID: Hi, On Sun, May 20, 2012 at 10:21 AM, Chao YUE wrote: > Dear all, > > could anybody give one sentence about this? why in the loop I didn't get > zerodivision error by when I explicitly do this, I get a zerodivision > error? thanks. > > In [7]: for i in np.arange(-10,10): > print 1./i > ...: > -0.1 > -0.111111111111 > -0.125 > -0.142857142857 > -0.166666666667 > -0.2 > -0.25 > -0.333333333333 > -0.5 > -1.0 > inf > 1.0 > 0.5 > 0.333333333333 > 0.25 > 0.2 > 0.166666666667 > 0.142857142857 > 0.125 > 0.111111111111 > > In [8]: 1/0. > --------------------------------------------------------------------------- > ZeroDivisionError Traceback (most recent call last) > /mnt/f/data/DROUGTH/ in () > ----> 1 1/0. > > ZeroDivisionError: float division by zero > > In [9]: 1./0. > --------------------------------------------------------------------------- > ZeroDivisionError Traceback (most recent call last) > /mnt/f/data/DROUGTH/ in () > ----> 1 1./0. > > ZeroDivisionError: float division by zero > > In [10]: 1./0 > --------------------------------------------------------------------------- > ZeroDivisionError Traceback (most recent call last) > /mnt/f/data/DROUGTH/ in () > ----> 1 1./0 > > ZeroDivisionError: float division by zero You may like to read more on here http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html#numpy.seterr So, for your specific example: In []: a= arange(-10, 10) In []: 1./ a Out[]: array([-0.1 , -0.11111111, -0.125 , -0.14285714, -0.16666667, -0.2 , -0.25 , -0.33333333, -0.5 , -1. , inf, 1. , 0.5 , 0.33333333, 0.25 , 0.2 , 0.16666667, 0.14285714, 0.125 , 0.11111111]) In []: seterr(divide= 'raise') In []: 1./ a ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in FloatingPointError: divide by zero encountered in divide My 2 cents, -eat > > > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun May 20 04:48:55 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 20 May 2012 02:48:55 -0600 Subject: [Numpy-discussion] Masked Array for NumPy 1.7 In-Reply-To: References: <81D136DD-9769-41D5-A6E3-B2A9ED100024@continuum.io> <9D1F36AF-E082-4085-B5DB-B4FCC0DC2A11@continuum.io> Message-ID: On Sat, May 19, 2012 at 11:38 PM, Travis Oliphant wrote: > > On May 20, 2012, at 12:15 AM, Charles R Harris wrote: > > > > On Sat, May 19, 2012 at 10:48 PM, Travis Oliphant wrote: > >> > >> > My own plan for the near term would be as follows: >> > >> > 1) Put in the experimental option and get the 1.7 release out. This >> gets us through the next couple of months and keeps things moving. >> > >> >> The "experimental" option does not solve the problem which is that the >> ndarray object now has masked fields which changes the fundamental nature >> of an ndarray for a lot of downstream users that really have no idea what >> has just happened. I don't see how this has been addressed by any >> proposal except for the one I have suggested which allows a masked array >> object and a regular ndarray to co-exist for a time. I doubt that the >> proposal actually helps get 1.7 out any faster either as there are multiple >> experimental APIs that would have to be created to pull it off on both the >> C and Python level. >> > > So, remove them in 1.8 and try something else. With experimental (say in > site.cfg), the base array could even be different. I don't see the problem > here. Think big. > > > I don't think I understand your mental model of this. Are you saying > add an experimental flag at the C-level (essentially a #define that > eliminates any code involving masked arrays unless the define is made at > compile time?) > > It seems like just applying Nathaniel's patch would be a better approach. > Do so then. Otherwise I am going to fork. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun May 20 05:34:46 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 20 May 2012 11:34:46 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 released Message-ID: Hi, I'm pleased to announce the availability of NumPy 1.6.2. This is a maintenance release. Due to the delay of the NumPy 1.7.0, this release contains far more fixes than a regular NumPy bugfix release. It also includes a number of documentation and build improvements. Sources and binary installers can be found at http://sourceforge.net/projects/numpy/files/NumPy/1.6.2/, release notes are copied below. Thanks to everyone who contributed to this release. Enjoy, The NumPy developers ========================= NumPy 1.6.2 Release Notes ========================= This is a bugfix release in the 1.6.x series. Due to the delay of the NumPy 1.7.0 release, this release contains far more fixes than a regular NumPy bugfix release. It also includes a number of documentation and build improvements. ``numpy.core`` issues fixed --------------------------- #2063 make unique() return consistent index #1138 allow creating arrays from empty buffers or empty slices #1446 correct note about correspondence vstack and concatenate #1149 make argmin() work for datetime #1672 fix allclose() to work for scalar inf #1747 make np.median() work for 0-D arrays #1776 make complex division by zero to yield inf properly #1675 add scalar support for the format() function #1905 explicitly check for NaNs in allclose() #1952 allow floating ddof in std() and var() #1948 fix regression for indexing chararrays with empty list #2017 fix type hashing #2046 deleting array attributes causes segfault #2033 a**2.0 has incorrect type #2045 make attribute/iterator_element deletions not segfault #2021 fix segfault in searchsorted() #2073 fix float16 __array_interface__ bug ``numpy.lib`` issues fixed -------------------------- #2048 break reference cycle in NpzFile #1573 savetxt() now handles complex arrays #1387 allow bincount() to accept empty arrays #1899 fixed histogramdd() bug with empty inputs #1793 fix failing npyio test under py3k #1936 fix extra nesting for subarray dtypes #1848 make tril/triu return the same dtype as the original array #1918 use Py_TYPE to access ob_type, so it works also on Py3 ``numpy.f2py`` changes ---------------------- ENH: Introduce new options extra_f77_compiler_args and extra_f90_compiler_args BLD: Improve reporting of fcompiler value BUG: Fix f2py test_kind.py test ``numpy.poly`` changes ---------------------- ENH: Add some tests for polynomial printing ENH: Add companion matrix functions DOC: Rearrange the polynomial documents BUG: Fix up links to classes DOC: Add version added to some of the polynomial package modules DOC: Document xxxfit functions in the polynomial package modules BUG: The polynomial convenience classes let different types interact DOC: Document the use of the polynomial convenience classes DOC: Improve numpy reference documentation of polynomial classes ENH: Improve the computation of polynomials from roots STY: Code cleanup in polynomial [*]fromroots functions DOC: Remove references to cast and NA, which were added in 1.7 ``numpy.distutils`` issues fixed ------------------------------- #1261 change compile flag on AIX from -O5 to -O3 #1377 update HP compiler flags #1383 provide better support for C++ code on HPUX #1857 fix build for py3k + pip BLD: raise a clearer warning in case of building without cleaning up first BLD: follow build_ext coding convention in build_clib BLD: fix up detection of Intel CPU on OS X in system_info.py BLD: add support for the new X11 directory structure on Ubuntu & co. BLD: add ufsparse to the libraries search path. BLD: add 'pgfortran' as a valid compiler in the Portland Group BLD: update version match regexp for IBM AIX Fortran compilers. ``numpy.random`` issues fixed ----------------------------- BUG: Use npy_intp instead of long in mtrand -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun May 20 06:00:39 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 20 May 2012 04:00:39 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.6.2 released In-Reply-To: References: Message-ID: On Sun, May 20, 2012 at 3:34 AM, Ralf Gommers wrote: > Hi, > > I'm pleased to announce the availability of NumPy 1.6.2. This is a > maintenance release. Due to the delay of the NumPy 1.7.0, this release > contains far more fixes than a regular NumPy bugfix release. It also > includes a number of documentation and build improvements. > > Sources and binary installers can be found at > > http://sourceforge.net/projects/numpy/files/NumPy/1.6.2/, release notes > are copied below. > > Thanks to everyone who contributed to this release. > > Enjoy, > The NumPy developers > > > > ========================= > NumPy 1.6.2 Release Notes > ========================= > > This is a bugfix release in the 1.6.x series. Due to the delay of the > NumPy > 1.7.0 release, this release contains far more fixes than a regular NumPy > bugfix > release. It also includes a number of documentation and build > improvements. > > > ``numpy.core`` issues fixed > --------------------------- > > #2063 make unique() return consistent index > #1138 allow creating arrays from empty buffers or empty slices > #1446 correct note about correspondence vstack and concatenate > #1149 make argmin() work for datetime > #1672 fix allclose() to work for scalar inf > #1747 make np.median() work for 0-D arrays > #1776 make complex division by zero to yield inf properly > #1675 add scalar support for the format() function > #1905 explicitly check for NaNs in allclose() > #1952 allow floating ddof in std() and var() > #1948 fix regression for indexing chararrays with empty list > #2017 fix type hashing > #2046 deleting array attributes causes segfault > #2033 a**2.0 has incorrect type > #2045 make attribute/iterator_element deletions not segfault > #2021 fix segfault in searchsorted() > #2073 fix float16 __array_interface__ bug > > > ``numpy.lib`` issues fixed > -------------------------- > > #2048 break reference cycle in NpzFile > #1573 savetxt() now handles complex arrays > #1387 allow bincount() to accept empty arrays > #1899 fixed histogramdd() bug with empty inputs > #1793 fix failing npyio test under py3k > #1936 fix extra nesting for subarray dtypes > #1848 make tril/triu return the same dtype as the original array > #1918 use Py_TYPE to access ob_type, so it works also on Py3 > > > ``numpy.f2py`` changes > ---------------------- > > ENH: Introduce new options extra_f77_compiler_args and > extra_f90_compiler_args > BLD: Improve reporting of fcompiler value > BUG: Fix f2py test_kind.py test > > > ``numpy.poly`` changes > ---------------------- > > ENH: Add some tests for polynomial printing > ENH: Add companion matrix functions > DOC: Rearrange the polynomial documents > BUG: Fix up links to classes > DOC: Add version added to some of the polynomial package modules > DOC: Document xxxfit functions in the polynomial package modules > BUG: The polynomial convenience classes let different types interact > DOC: Document the use of the polynomial convenience classes > DOC: Improve numpy reference documentation of polynomial classes > ENH: Improve the computation of polynomials from roots > STY: Code cleanup in polynomial [*]fromroots functions > DOC: Remove references to cast and NA, which were added in 1.7 > > > ``numpy.distutils`` issues fixed > ------------------------------- > > #1261 change compile flag on AIX from -O5 to -O3 > #1377 update HP compiler flags > #1383 provide better support for C++ code on HPUX > #1857 fix build for py3k + pip > BLD: raise a clearer warning in case of building without cleaning up > first > BLD: follow build_ext coding convention in build_clib > BLD: fix up detection of Intel CPU on OS X in system_info.py > BLD: add support for the new X11 directory structure on Ubuntu & co. > BLD: add ufsparse to the libraries search path. > BLD: add 'pgfortran' as a valid compiler in the Portland Group > BLD: update version match regexp for IBM AIX Fortran compilers. > > > ``numpy.random`` issues fixed > ----------------------------- > > BUG: Use npy_intp instead of long in mtrand > > > Thanks Ralf. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun May 20 06:35:19 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 20 May 2012 12:35:19 +0200 Subject: [Numpy-discussion] Separating out the maskna code In-Reply-To: References: Message-ID: On Sat, May 19, 2012 at 10:54 PM, Nathaniel Smith wrote: > > Ralf, IIUC merging this and my other outstanding PRs would leave the > datetime issues on python3/win32 as the only outstanding blocker? > Yes. There are some more open tickets for 1.7 (see http://projects.scipy.org/numpy/report/3), but I don't consider any of those blockers. Although #2078 should probably be fixed, since it's a regression. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From diehose at freenet.de Sun May 20 07:45:03 2012 From: diehose at freenet.de (bmu) Date: Sun, 20 May 2012 13:45:03 +0200 Subject: [Numpy-discussion] Named dtype array: Difference between a[0]['name'] and a['name'][0]? Message-ID: <2043971.x9Wcnqnjnd@wohnzimmer> I came acroos a question on stackoverflow (http://stackoverflow.com/q/9470604) and I am wondering if this is a bug import numpy as np dt = np.dtype([('tuple', (int, 2))]) a = np.zeros(3, dt) type(a['tuple'][0]) # ndarray type(a[0]['tuple']) # ndarray a['tuple'][0] = (1,2) # ok a[0]['tuple'] = (1,2) # ValueError: shape-mismatch on array construction Could somebody explain this behaviour (either in this mailing list or on stackoverflow)? bmu -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Sun May 20 08:19:45 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sun, 20 May 2012 14:19:45 +0200 Subject: [Numpy-discussion] why not zerodivision error? In-Reply-To: References: Message-ID: thanks for this information. Chao 2012/5/20 eat > Hi, > > On Sun, May 20, 2012 at 10:21 AM, Chao YUE wrote: > >> Dear all, >> >> could anybody give one sentence about this? why in the loop I didn't get >> zerodivision error by when I explicitly do this, I get a zerodivision >> error? thanks. >> >> In [7]: for i in np.arange(-10,10): >> print 1./i >> ...: >> -0.1 >> -0.111111111111 >> -0.125 >> -0.142857142857 >> -0.166666666667 >> -0.2 >> -0.25 >> -0.333333333333 >> -0.5 >> -1.0 >> inf >> 1.0 >> 0.5 >> 0.333333333333 >> 0.25 >> 0.2 >> 0.166666666667 >> 0.142857142857 >> 0.125 >> 0.111111111111 >> >> In [8]: 1/0. >> >> --------------------------------------------------------------------------- >> ZeroDivisionError Traceback (most recent call >> last) >> /mnt/f/data/DROUGTH/ in () >> ----> 1 1/0. >> >> ZeroDivisionError: float division by zero >> >> In [9]: 1./0. >> >> --------------------------------------------------------------------------- >> ZeroDivisionError Traceback (most recent call >> last) >> /mnt/f/data/DROUGTH/ in () >> ----> 1 1./0. >> >> ZeroDivisionError: float division by zero >> >> In [10]: 1./0 >> >> --------------------------------------------------------------------------- >> ZeroDivisionError Traceback (most recent call >> last) >> /mnt/f/data/DROUGTH/ in () >> ----> 1 1./0 >> >> ZeroDivisionError: float division by zero > > You may like to read more on here > http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html#numpy.seterr > > So, for your specific example: > In []: a= arange(-10, 10) > In []: 1./ a > Out[]: > array([-0.1 , -0.11111111, -0.125 , -0.14285714, -0.16666667, > -0.2 , -0.25 , -0.33333333, -0.5 , -1. , > inf, 1. , 0.5 , 0.33333333, 0.25 , > 0.2 , 0.16666667, 0.14285714, 0.125 , 0.11111111]) > In []: seterr(divide= 'raise') > In []: 1./ a > ------------------------------------------------------------ > Traceback (most recent call last): > File "", line 1, in > FloatingPointError: divide by zero encountered in divide > > > My 2 cents, > -eat > >> >> >> >> Chao >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Sun May 20 11:09:30 2012 From: lists at hilboll.de (Andreas Hilboll) Date: Sun, 20 May 2012 17:09:30 +0200 Subject: [Numpy-discussion] why two versions of polyfit? In-Reply-To: References: Message-ID: <4FB9092A.705@hilboll.de> Hi, I just noticed that there's two polyfit functions, one in numpy.lib.polynomial, and one in numpy.polynomial. What's the reason for this? The calling signatures aren't identical (the numpy.polynomial version supports weights), and I couldn't find a notice on why two versions exist. Puzzled greetings, Andreas. From charlesr.harris at gmail.com Sun May 20 11:37:52 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 20 May 2012 09:37:52 -0600 Subject: [Numpy-discussion] why two versions of polyfit? In-Reply-To: <4FB9092A.705@hilboll.de> References: <4FB9092A.705@hilboll.de> Message-ID: On Sun, May 20, 2012 at 9:09 AM, Andreas Hilboll wrote: > Hi, > > I just noticed that there's two polyfit functions, one in > numpy.lib.polynomial, and one in numpy.polynomial. What's the reason for > this? The calling signatures aren't identical (the numpy.polynomial > version supports weights), and I couldn't find a notice on why two > versions exist. > > There are two different polynomial objects, Polynomial and poly1d. The Polynomial object is part of a newer group that also contains Lengendre, Chebyshev, etc., and doesn't have some of the problems that poly1d has. Poly1d is an older implementation. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun May 20 11:41:56 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 20 May 2012 09:41:56 -0600 Subject: [Numpy-discussion] why two versions of polyfit? In-Reply-To: References: <4FB9092A.705@hilboll.de> Message-ID: On Sun, May 20, 2012 at 9:37 AM, Charles R Harris wrote: > > > On Sun, May 20, 2012 at 9:09 AM, Andreas Hilboll wrote: > >> Hi, >> >> I just noticed that there's two polyfit functions, one in >> numpy.lib.polynomial, and one in numpy.polynomial. What's the reason for >> this? The calling signatures aren't identical (the numpy.polynomial >> version supports weights), and I couldn't find a notice on why two >> versions exist. >> >> > There are two different polynomial objects, Polynomial and poly1d. The > Polynomial object is part of a newer group that also contains Lengendre, > Chebyshev, etc., and doesn't have some of the problems that poly1d has. > Poly1d is an older implementation. > Oh, and the polyfit function in polynomial.polynomial isn't meant to be used directly, it is mostly there to support the fit class function of Polynomial. See the documentation here . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun May 20 11:44:40 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 20 May 2012 09:44:40 -0600 Subject: [Numpy-discussion] why two versions of polyfit? In-Reply-To: References: <4FB9092A.705@hilboll.de> Message-ID: On Sun, May 20, 2012 at 9:41 AM, Charles R Harris wrote: > > > On Sun, May 20, 2012 at 9:37 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sun, May 20, 2012 at 9:09 AM, Andreas Hilboll wrote: >> >>> Hi, >>> >>> I just noticed that there's two polyfit functions, one in >>> numpy.lib.polynomial, and one in numpy.polynomial. What's the reason for >>> this? The calling signatures aren't identical (the numpy.polynomial >>> version supports weights), and I couldn't find a notice on why two >>> versions exist. >>> >>> >> There are two different polynomial objects, Polynomial and poly1d. The >> Polynomial object is part of a newer group that also contains Lengendre, >> Chebyshev, etc., and doesn't have some of the problems that poly1d has. >> Poly1d is an older implementation. >> > > Oh, and the polyfit function in polynomial.polynomial isn't meant to be > used directly, it is mostly there to support the fit class function of > Polynomial. See the documentation here > . > > Better link here . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Sun May 20 11:53:52 2012 From: lists at hilboll.de (Andreas Hilboll) Date: Sun, 20 May 2012 17:53:52 +0200 Subject: [Numpy-discussion] why two versions of polyfit? In-Reply-To: References: <4FB9092A.705@hilboll.de> Message-ID: <4FB91390.5030802@hilboll.de> > On Sun, May 20, 2012 at 9:37 AM, Charles R Harris > > wrote: > > > > On Sun, May 20, 2012 at 9:09 AM, Andreas Hilboll > wrote: > > Hi, > > I just noticed that there's two polyfit functions, one in > numpy.lib.polynomial, and one in numpy.polynomial. What's the > reason for > this? The calling signatures aren't identical (the numpy.polynomial > version supports weights), and I couldn't find a notice on why two > versions exist. > > > There are two different polynomial objects, Polynomial and poly1d. > The Polynomial object is part of a newer group that also contains > Lengendre, Chebyshev, etc., and doesn't have some of the problems > that poly1d has. Poly1d is an older implementation. I think it would be beneficial for the user if this fact was noted somewhere in the docstring of the Poly1d implementation. Especially since numpy.polyfit is pointing to that old implementation. When I saw the polyfit function in the numpy namespace, I didn't bother checking if there's anything more sophisticated. I could add the appropriate links in the "see also" sections of the Poly1d docstrings, if you guys agree. > Oh, and the polyfit function in polynomial.polynomial isn't meant to be > used directly, it is mostly there to support the fit class function of > Polynomial. See the documentation here . Ah, okay. Thanks for that. Cheers, Andreas. From charlesr.harris at gmail.com Sun May 20 12:11:39 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 20 May 2012 10:11:39 -0600 Subject: [Numpy-discussion] why two versions of polyfit? In-Reply-To: <4FB91390.5030802@hilboll.de> References: <4FB9092A.705@hilboll.de> <4FB91390.5030802@hilboll.de> Message-ID: On Sun, May 20, 2012 at 9:53 AM, Andreas Hilboll wrote: > > On Sun, May 20, 2012 at 9:37 AM, Charles R Harris > > > wrote: > > > > > > > > On Sun, May 20, 2012 at 9:09 AM, Andreas Hilboll > > wrote: > > > > Hi, > > > > I just noticed that there's two polyfit functions, one in > > numpy.lib.polynomial, and one in numpy.polynomial. What's the > > reason for > > this? The calling signatures aren't identical (the > numpy.polynomial > > version supports weights), and I couldn't find a notice on why > two > > versions exist. > > > > > > There are two different polynomial objects, Polynomial and poly1d. > > The Polynomial object is part of a newer group that also contains > > Lengendre, Chebyshev, etc., and doesn't have some of the problems > > that poly1d has. Poly1d is an older implementation. > > I think it would be beneficial for the user if this fact was noted > somewhere in the docstring of the Poly1d implementation. Especially > since numpy.polyfit is pointing to that old implementation. When I saw > the polyfit function in the numpy namespace, I didn't bother checking if > there's anything more sophisticated. > > I could add the appropriate links in the "see also" sections of the > Poly1d docstrings, if you guys agree. > That would be useful. > > > Oh, and the polyfit function in polynomial.polynomial isn't meant to be > > used directly, it is mostly there to support the fit class function of > > Polynomial. See the documentation here < > http://preview.tinyurl.com/8289gfs>. > > Ah, okay. Thanks for that. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun May 20 12:39:06 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 20 May 2012 18:39:06 +0200 Subject: [Numpy-discussion] Internationalization of numpy/scipy docstrings... In-Reply-To: References: Message-ID: On Sun, May 20, 2012 at 12:04 AM, Tim Cera wrote: > I have thought for a long time that it would be nice to have numpy/scipy > docs in multiple languages. I didn't have any idea how to do it until I > saw http://sphinx.pocoo.org/intl.html. The gettext builder which is a > requirement to make this happen is relatively new to sphinx. > > > Outline of above applied to numpy/scipy... > > > 1. pydocweb would use the new gettext builder to convert *.rst to *.pot > files. > > 2. Translators would use pootle to edit the *.pot files to *.po files > > pydocweb or pootle would use mgsfmt to create *.mo files > > 3. From here can choose either: > > a. Have pydocweb use sphinx-build to create new, > > translated *.rst files from the *.mo files. > > (my favorite since we would have *.rst files) > > b. OR use gettext in Python to translate docstring > > on-the-fly from the *.mo files. > > > A user would then install a language kit, maybe something like scikits > and access the translated docstring with a new 'np.info'. As near as I > can figure, Python 'help' command can't be replaced by something else, so > 'help' would always display the English docstring. > > > I have pydocweb and pootle setup locally and working. Ran into a problem > though with sphinx-build creating the initial *.pot files. It seems to be a > problem with numpydoc. It fails on 'function' and 'auto*' directives. I > tried to look at numpydoc and it is a bit of very intense coding and I > frankly have not been able to find my way around. > > > I am willing to put in some work for this to happen. My block right now is > getting the initial *.pot files. > > > Any interest? > Are you thinking only about documentation in .rst files (like the tutorials), or also the docstrings themselves? The former may be feasible, the latter I think will be difficult. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Sun May 20 12:55:49 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Sun, 20 May 2012 12:55:49 -0400 Subject: [Numpy-discussion] why not zerodivision error? In-Reply-To: References: Message-ID: On Sun, May 20, 2012 at 3:47 AM, eat wrote: > Hi, > > On Sun, May 20, 2012 at 10:21 AM, Chao YUE wrote: > >> Dear all, >> >> could anybody give one sentence about this? why in the loop I didn't get >> zerodivision error by when I explicitly do this, I get a zerodivision >> error? thanks. >> >> In [7]: for i in np.arange(-10,10): >> print 1./i >> ...: >> -0.1 >> -0.111111111111 >> -0.125 >> -0.142857142857 >> -0.166666666667 >> -0.2 >> -0.25 >> -0.333333333333 >> -0.5 >> -1.0 >> inf >> 1.0 >> 0.5 >> 0.333333333333 >> 0.25 >> 0.2 >> 0.166666666667 >> 0.142857142857 >> 0.125 >> 0.111111111111 >> >> In [8]: 1/0. >> >> --------------------------------------------------------------------------- >> ZeroDivisionError Traceback (most recent call >> last) >> /mnt/f/data/DROUGTH/ in () >> ----> 1 1/0. >> > [snip] > You may like to read more on here > http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html#numpy.seterr > [snip] > My 2 cents, > -eat > >> >> Also, note that the original errors were raised when working with pure Python types (ints and floats), while in the loop you were dividing by numpy scalars, which handles division-by-zero differently. Best, -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun May 20 13:59:07 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 20 May 2012 18:59:07 +0100 Subject: [Numpy-discussion] Separating out the maskna code In-Reply-To: <5432E28E-A6B0-4B3B-8932-14B2114ABDD1@continuum.io> References: <5432E28E-A6B0-4B3B-8932-14B2114ABDD1@continuum.io> Message-ID: On Sun, May 20, 2012 at 6:08 AM, Travis Oliphant wrote: > Wow, Nathaniel. ? This looks like a nice piece of tedious work. Honestly, it only took a few hours -- M-x grep is awesome. Would still have been better if it'd been separated in the first place, but so it goes. > I have not reviewed it in detail, but in general I would be very supportive of your plan to commit this to master, make a 1.7 release (without the ReduceWrapper) function and then work on the masked array / ndarray separation plan for 1.8 > > Of course, first I would want to hear from Mark, to hear his comments about what was removed. Definitely. I'm pretty sure I didn't accidentally sweep up anything else in my net besides what it says in the commit messages (simply because it's hard to do that when all you're doing is grepping for HASMASKNA and friends), but he knows this code better than I do. NB for anyone who may have pulled, I did just rebase this (and push --force it) to fix a trivial issue (I forgot to remove the signature for PyArray_CountNonZero when I removed the function itself). So you'll want to throw away any local branches if you have them. - N From njs at pobox.com Sun May 20 14:02:46 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 20 May 2012 19:02:46 +0100 Subject: [Numpy-discussion] Separating out the maskna code In-Reply-To: References: Message-ID: On Sun, May 20, 2012 at 11:35 AM, Ralf Gommers wrote: > > > On Sat, May 19, 2012 at 10:54 PM, Nathaniel Smith wrote: >> >> >> Ralf, IIUC merging this and my other outstanding PRs would leave the >> datetime issues on python3/win32 as the only outstanding blocker? > > > Yes. There are some more open tickets for 1.7 (see > http://projects.scipy.org/numpy/report/3), but I don't consider any of those > blockers. Although #2078 should probably be fixed, since it's a regression. I took a look at #2078 actually, and it's easy to fix: https://github.com/njsmith/numpy/commit/a0448b83e6b2efd981ef6ad6689b8b2fafebb834 But, because of all the churn in the reduction code that the separate-maskna branch has, I made that patch against separate-maskna instead of master. So, I'll wait to see how that shakes out before making a pull request for #2078. -- Nathaniel From njs at pobox.com Sun May 20 14:06:37 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 20 May 2012 19:06:37 +0100 Subject: [Numpy-discussion] Separating out the maskna code In-Reply-To: References: <5432E28E-A6B0-4B3B-8932-14B2114ABDD1@continuum.io> Message-ID: On Sun, May 20, 2012 at 6:59 PM, Nathaniel Smith wrote: >> I have not reviewed it in detail, but in general I would be very supportive of your plan to commit this to master, make a 1.7 release (without the ReduceWrapper) function and then work on the masked array / ndarray separation plan for 1.8 >> >> Of course, first I would want to hear from Mark, to hear his comments about what was removed. > > Definitely. I'm pretty sure I didn't accidentally sweep up anything > else in my net besides what it says in the commit messages (simply > because it's hard to do that when all you're doing is grepping for > HASMASKNA and friends), but he knows this code better than I do. Also on that note, if someone can merge the PyArray_Diagonal PR then I can sort out the conflicts and then make a PR for this, to make review easier... - N From tim at cerazone.net Sun May 20 17:59:54 2012 From: tim at cerazone.net (Tim Cera) Date: Sun, 20 May 2012 17:59:54 -0400 Subject: [Numpy-discussion] Internationalization of numpy/scipy docstrings... In-Reply-To: References: Message-ID: > > Are you thinking only about documentation in .rst files (like the > tutorials), or also the docstrings themselves? The former may be feasible, > the latter I think will be difficult. > > Everything. Within the documentation editor the RST docstrings are parsed from the functions, so instead of only storing them in the database for Django/doceditor to work with, can save them to *.rst files. I don't know how integrated we could/would make the documentation editor/sphinx/pootle combination, so I think the easiest would be integration through files. Your question points out a detail (and some small refinements) that I should have put in the outline from my first message: 0.5. As the pydocweb editor works on docstrings, up-to-date RST files are also saved to the file system, and triggers... 1. The new gettext builder to convert *.rst to *.pot files. 1.5. (OPTIONAL) Can make a preliminary, automatic translation. Pootle currently supports Google Translate (now costs $) or Apertium. 2. Translators would use pootle to edit the *.pot files to *.po files 2.5. Use mgsfmt to create *.mo files 3. From here can choose either: a. Use sphinx-build to create new, translated *.rst files from the *.mo files. (my favorite since we would have *.rst files) b. OR use gettext in Python to translate docstring on-the-fly from the *.mo files. At this point we would need to have an environment variable or other configuration mechanism to set the desired locale, which np.info would use to find the correct directory/rst file. Lets just say for sake of my example that the configuration is handled by a np.locale function. np.info(np.array) # display English docstring as it currently does np.locale('fr') np.info(np.array) # display the French docstring Reference links: sphinx based translation http://sphinx.pocoo.org/latest/intl.html http://www.slideshare.net/lehmannro/sphinxi18n-the-true-story Pootle: http://translate.sourceforge.net/wiki/pootle/index (You have to get the development versions of translate and pootle to work with Django 1.4.) Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun May 20 22:43:22 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 20 May 2012 21:43:22 -0500 Subject: [Numpy-discussion] why two versions of polyfit? In-Reply-To: <4FB91390.5030802@hilboll.de> References: <4FB9092A.705@hilboll.de> <4FB91390.5030802@hilboll.de> Message-ID: Documentation helps are always welcome. Please make sure to advertise widely, though, that the new Polynomial class changes the ordering convention of the coefficients away from the Matlab standard. I think this will be a point of confusion unless it is carefully documented. It's also why poly1d can't disappear (even though it would be nice to make it just a wrapper on top of the other Polynomial classes). -Travis On May 20, 2012, at 10:53 AM, Andreas Hilboll wrote: >> On Sun, May 20, 2012 at 9:37 AM, Charles R Harris >> > wrote: >> >> >> >> On Sun, May 20, 2012 at 9:09 AM, Andreas Hilboll > > wrote: >> >> Hi, >> >> I just noticed that there's two polyfit functions, one in >> numpy.lib.polynomial, and one in numpy.polynomial. What's the >> reason for >> this? The calling signatures aren't identical (the numpy.polynomial >> version supports weights), and I couldn't find a notice on why two >> versions exist. >> >> >> There are two different polynomial objects, Polynomial and poly1d. >> The Polynomial object is part of a newer group that also contains >> Lengendre, Chebyshev, etc., and doesn't have some of the problems >> that poly1d has. Poly1d is an older implementation. > > I think it would be beneficial for the user if this fact was noted > somewhere in the docstring of the Poly1d implementation. Especially > since numpy.polyfit is pointing to that old implementation. When I saw > the polyfit function in the numpy namespace, I didn't bother checking if > there's anything more sophisticated. > > I could add the appropriate links in the "see also" sections of the > Poly1d docstrings, if you guys agree. > >> Oh, and the polyfit function in polynomial.polynomial isn't meant to be >> used directly, it is mostly there to support the fit class function of >> Polynomial. See the documentation here . > > Ah, okay. Thanks for that. > > Cheers, > Andreas. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From thouis at gmail.com Mon May 21 13:32:34 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Mon, 21 May 2012 19:32:34 +0200 Subject: [Numpy-discussion] tracing numpy data allocation with python callbacks In-Reply-To: References: Message-ID: I submitted a PR for this functionality after cleaning it up a bit: https://github.com/numpy/numpy/pull/284 I've attached an example that produces HTML for a a sortable table tracking allocations while running through numpy.test(). Ray Jones -------------- next part -------------- A non-text attachment was scrubbed... Name: sorttable.js Type: application/x-javascript Size: 16917 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: track_allocation.py Type: application/octet-stream Size: 4459 bytes Desc: not available URL: From aldcroft at head.cfa.harvard.edu Mon May 21 13:47:56 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Mon, 21 May 2012 13:47:56 -0400 Subject: [Numpy-discussion] subclassing ndarray subtleties?? Message-ID: Over on the scipy-user mailing list there was a question about subclassing ndarray and I was interested to see two responses that seemed to imply that subclassing should be avoided. >From Dag and Nathaniel, respectively: "Subclassing ndarray is a very tricky business -- I did it once and regretted having done it for years, because there's so much you can't do etc.. You're almost certainly better off with embedding an array as an attribute, and then forward properties etc. to it." "Yes, it's almost always the wrong thing..." So my question is whether there are subtleties or issues that are not covered in the standard NumPy documents on subclassing ndarray. What are the "things you can't do etc"? I'm working on a project that relies heavily on an ndarray subclass which just adds a few attributes and slightly tweaks __getitem__. It seems fine and I really like that the class is an ndarray with all the built-in methods already there. Am I missing anything? >From the scipy thread I did already learn that one should also override __getslice__ in addition to __getitem__ to be safe. Thanks, Tom From diehose at freenet.de Mon May 21 14:50:25 2012 From: diehose at freenet.de (bmu) Date: Mon, 21 May 2012 20:50:25 +0200 Subject: [Numpy-discussion] Fwd: Named dtype array: Difference between a[0]['name'] and a['name'][0]? Message-ID: <5946941.oVP5cPdKPZ@wohnzimmer> dear all, can anybody tell me, why nobody is answering this question? is this the wrong place to ask? or does nobody know an answer? bj?rn -------------- next part -------------- An embedded message was scrubbed... From: bmu Subject: Named dtype array: Difference between a[0]['name'] and a['name'][0]? Date: Sun, 20 May 2012 13:45:03 +0200 Size: 3118 URL: From thouis at gmail.com Mon May 21 15:39:21 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Mon, 21 May 2012 21:39:21 +0200 Subject: [Numpy-discussion] tracing numpy data allocation with python callbacks In-Reply-To: References: Message-ID: On Mon, May 21, 2012 at 7:32 PM, Thouis (Ray) Jones wrote: > I submitted a PR for this functionality after cleaning it up a bit: > https://github.com/numpy/numpy/pull/284 I meant to ask here (and Travis reminded me in the PR): Currently, the trace_data_allocations() function (the only one added by this change) is in numpy.core.multiarray. Should it be somewhere else, such as numpy.core? I put it in multiarray only because it was close to the functions doing the actual allocation. Ray Jones From travis at continuum.io Mon May 21 16:37:36 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 21 May 2012 15:37:36 -0500 Subject: [Numpy-discussion] Fwd: Named dtype array: Difference between a[0]['name'] and a['name'][0]? In-Reply-To: <5946941.oVP5cPdKPZ@wohnzimmer> References: <5946941.oVP5cPdKPZ@wohnzimmer> Message-ID: <249AA706-CE3F-49A5-924C-6B101637EAA1@continuum.io> This is the right place to ask, it's just that it can take time to get an answer because people who might know the answer may not have the time to respond immediately. The short answer is that this is not really a "normal" bug, but it could be considered a "design" bug (although the issues may not be straightforward to resolve). What that means is that it may not be changed in the short term --- and you should just use the first spelling. Structured arrays can be a confusing area of NumPy for several of reasons. You've constructed an example that touches on several of them. You have a data-type that is a "structure" array with one member ("tuple"). That member contains a 2-vector of integers. First of all, it is important to remember that with Python, doing a['tuple'][0] = (1,2) is equivalent to b = a['tuple']; b[0] = (1,2). In like manner, a[0]['tuple'] = (1,2) is equivalent to b = a[0]; b['tuple'] = (1,2). To understand the behavior, we need to dissect both code paths and what happens. You built a (3,) array of those elements in 'a'. When you write b = a['tuple'] you should probably be getting a (3,) array of (2,)-integers, but as there is currently no formal dtype support for (n,)-integers as a general dtype in NumPy, you get back a (3,2) array of integers which is the closest thing that NumPy can give you. Setting the [0] row of this object via a['tuple'][0] = (1,2) works just fine and does what you would expect. On the other hand, when you type: b = a[0] you are getting back an array-scalar which is a particularly interesting kind of array scalar that can hold records. This new object is formally of type numpy.void and it holds a "scalar representation" of anything that fits under the "VOID" basic dtype. For some reason: b['tuple'] = [1,2] is not working. On my system I'm getting a different error: TypeError: object of type 'int' has no len() I think this should be filed as a bug on the issue tracker which is for the time being here: http://projects.scipy.org/numpy The problem is ultimately the void->copyswap function being called in voidtype_setfields if someone wants to investigate. I think this behavior should work. -Travis On May 21, 2012, at 1:50 PM, bmu wrote: > dear all, > > can anybody tell me, why nobody is answering this question? is this the wrong > place to ask? or does nobody know an answer? > > bj?rn > From: bmu > Subject: Named dtype array: Difference between a[0]['name'] and a['name'][0]? > Date: May 20, 2012 6:45:03 AM CDT > To: numpy-discussion at scipy.org > > > I came acroos a question on stackoverflow (http://stackoverflow.com/q/9470604) and I am wondering if this is a bug > > import numpy as np > dt = np.dtype([('tuple', (int, 2))]) > a = np.zeros(3, dt) > type(a['tuple'][0]) # ndarray > type(a[0]['tuple']) # ndarray > > a['tuple'][0] = (1,2) # ok > a[0]['tuple'] = (1,2) # ValueError: shape-mismatch on array construction > Could somebody explain this behaviour (either in this mailing list or on stackoverflow)? > > bmu > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon May 21 16:53:05 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 21 May 2012 16:53:05 -0400 Subject: [Numpy-discussion] Fwd: Named dtype array: Difference between a[0]['name'] and a['name'][0]? In-Reply-To: <249AA706-CE3F-49A5-924C-6B101637EAA1@continuum.io> References: <5946941.oVP5cPdKPZ@wohnzimmer> <249AA706-CE3F-49A5-924C-6B101637EAA1@continuum.io> Message-ID: On Mon, May 21, 2012 at 4:37 PM, Travis Oliphant wrote: > This is the right place to ask, it's just that it can take time to get an > answer because people who might know the answer may not have the time to > respond immediately. > > The short answer is that this is not really a "normal" bug, but it could be > considered a "design" bug (although the issues may not be straightforward to > resolve). ? ?What that means is that it may not be changed in the short term > --- and you should just use the first spelling. > > Structured arrays can be a confusing area of NumPy for several of reasons. > You've constructed an example that touches on several of them. ? You have a > data-type that is a "structure" array with one member ("tuple"). ?That > member contains a 2-vector of integers. > > First of all, it is important to remember that with Python, doing > a['tuple'][0] = (1,2) is equivalent to b = a['tuple']; b[0] = (1,2). ? In > like manner, a[0]['tuple'] = (1,2) is equivalent to b = a[0]; b['tuple'] = > (1,2). > > To understand the behavior, we need to dissect both code paths and what > happens. ? You built a (3,) array of those elements in 'a'. ?When you write > b = a['tuple'] you should probably be getting a (3,) array of (2,)-integers, > but as there is currently no formal dtype support for (n,)-integers as a > general dtype in NumPy, you get back a (3,2) array of integers which is the > closest thing that NumPy can give you. ? ?Setting the [0] row of this object > via > > a['tuple'][0] = (1,2) > > works just fine and does what you would expect. > > On the other hand, when you type: > > b = a[0] > > you are getting back an array-scalar which is a particularly interesting > kind of array scalar that can hold records. ? ?This new object is formally > of type numpy.void and it holds a "scalar representation" of anything that > fits under the "VOID" basic dtype. > > For some reason: > > b['tuple'] = [1,2] > > is not working. ? On my system I'm getting a different error:?TypeError: > object of type 'int' has no len() > > I think this should be filed as a bug on the issue tracker which is for the > time being here: ? ?http://projects.scipy.org/numpy > > The problem is ultimately the void->copyswap function being called in > voidtype_setfields if someone wants to investigate. ? I think this behavior > should work. > Just playing around I found this to be odd, though I guess makes some sense given your comments [~/] [12]: b['tuple'] = [(1,2)] [~/] [13]: b [13]: ([1, 0],) [~/] [14]: a [14]: array([([1, 0],), ([0, 0],), ([0, 0],)], dtype=[('tuple', ' References: Message-ID: On Sun, May 20, 2012 at 11:59 PM, Tim Cera wrote: > Are you thinking only about documentation in .rst files (like the >> tutorials), or also the docstrings themselves? The former may be feasible, >> the latter I think will be difficult. >> >> > Everything. Within the documentation editor the RST docstrings are parsed > from the functions, so instead of only storing them in the database for > Django/doceditor to work with, can save them to *.rst files. > > I don't know how integrated we could/would make the documentation > editor/sphinx/pootle combination, so I think the easiest would be > integration through files. Your question points out a detail (and some > small refinements) that I should have put in the outline from my first > message: > > 0.5. As the pydocweb editor works on docstrings, up-to-date RST files > are also saved to the file system, and triggers... > > 1. The new gettext builder to convert *.rst to *.pot files. > > 1.5. (OPTIONAL) Can make a preliminary, automatic translation. Pootle > > currently supports Google Translate (now costs $) or Apertium. > > 2. Translators would use pootle to edit the *.pot files to *.po files > > 2.5. Use mgsfmt to create *.mo files > > 3. From here can choose either: > > a. Use sphinx-build to create new, > > translated *.rst files from the *.mo files. > > (my favorite since we would have *.rst files) > > b. OR use gettext in Python to translate docstring > > on-the-fly from the *.mo files. > > > Docstrings are not stored in .rst files but in the numpy sources, so there are some non-trivial technical and workflow details missing here. But besides that, I think translating everything (even into a single language) is a massive amount of work, and it's not at all clear if there's enough people willing to help out with this. So I'd think it would be better to start with just the high-level docs (numpy user guide, scipy tutorial) to see how it goes. Thinking about what languages to translate into would also make sense, since having a bunch of partial translations lying around doesn't help anyone. First thought: Spanish, Chinese. Ralf > At this point we would need to have an environment variable or other > configuration mechanism to set the desired locale, which np.info would > use to find the correct directory/rst file. Lets just say for sake of my > example that the configuration is handled by a np.locale function. > > > np.info(np.array) > > # display English docstring as it currently does > > > np.locale('fr') > > np.info(np.array) > > # display the French docstring > > > Reference links: > > sphinx based translation > > http://sphinx.pocoo.org/latest/intl.html > > http://www.slideshare.net/lehmannro/sphinxi18n-the-true-story > > Pootle: > > http://translate.sourceforge.net/wiki/pootle/index > > (You have to get the development versions of translate and pootle to > work with Django 1.4.) > > > Kindest regards, > > Tim > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon May 21 18:19:26 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 21 May 2012 23:19:26 +0100 Subject: [Numpy-discussion] Internationalization of numpy/scipy docstrings... In-Reply-To: References: Message-ID: On Mon, May 21, 2012 at 10:44 PM, Ralf Gommers wrote: > Thinking about what languages to translate into would also make sense, since > having a bunch of partial translations lying around doesn't help anyone. > First thought: Spanish, Chinese. It's not like one can tell two translator volunteers to start speaking the same language so as to better pool their efforts... they kind of speak whatever they speak. But there is quite a bit of translator volunteer person-power available out there across many languages. If Tim gets the infrastructure worked out then advertising on some of the big translation project mailing lists will probably get a lot of eyeballs. - N From njs at pobox.com Mon May 21 18:48:34 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 21 May 2012 23:48:34 +0100 Subject: [Numpy-discussion] Trivial pull request: tox support Message-ID: I got tired of juggling virtualenvs, and probably everyone else would soon get tired of pointing out Python 2.4 incompatibilities I'd forgotten to test, so: https://github.com/numpy/numpy/pull/285 - N From travis at continuum.io Tue May 22 00:34:51 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 21 May 2012 23:34:51 -0500 Subject: [Numpy-discussion] Separating out the maskna code In-Reply-To: References: <5432E28E-A6B0-4B3B-8932-14B2114ABDD1@continuum.io> Message-ID: Just to be clear. Are we waiting for the conclusion of the PyArray_Diagonal PR before proceeding with this one? -Travis On May 20, 2012, at 1:06 PM, Nathaniel Smith wrote: > On Sun, May 20, 2012 at 6:59 PM, Nathaniel Smith wrote: >>> I have not reviewed it in detail, but in general I would be very supportive of your plan to commit this to master, make a 1.7 release (without the ReduceWrapper) function and then work on the masked array / ndarray separation plan for 1.8 >>> >>> Of course, first I would want to hear from Mark, to hear his comments about what was removed. >> >> Definitely. I'm pretty sure I didn't accidentally sweep up anything >> else in my net besides what it says in the commit messages (simply >> because it's hard to do that when all you're doing is grepping for >> HASMASKNA and friends), but he knows this code better than I do. > > Also on that note, if someone can merge the PyArray_Diagonal PR then I > can sort out the conflicts and then make a PR for this, to make review > easier... > > - N > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Tue May 22 04:27:31 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 22 May 2012 09:27:31 +0100 Subject: [Numpy-discussion] un-silencing Numpy's deprecation warnings Message-ID: So starting in Python 2.7 and 3.2, the Python developers have made DeprecationWarnings invisible by default: http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html http://bugs.python.org/issue7319 The only way to see them is to explicitly request them by running Python with -Wd. The logic seems to be that between the end-of-development for 2.7 and the moratorium on 3.2 changes, there were a *lot* of added deprecations that were annoying people, and deprecations in the Python stdlib mean "this code is probably sub-optimal but it will still continue to work indefinitely". So they consider that deprecation warnings are like a lint tool for conscientious developers who remember to test their code with -Wd, but not something to bother users with. In Numpy, the majority of our users are actually (relatively unsophisticated) developers, and we don't plan to support deprecated features indefinitely. Our deprecations seem to better match what Python calls a "FutureWarning": "warnings about constructs that will change semantically in the future." http://docs.python.org/library/warnings.html#warning-categories FutureWarning is displayed by default, and available in all versions of Python. So maybe we should change all our DeprecationWarnings into FutureWarnings (or at least the ones that we actually plan to follow through on). Thoughts? - N From njs at pobox.com Tue May 22 04:30:22 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 22 May 2012 09:30:22 +0100 Subject: [Numpy-discussion] Separating out the maskna code In-Reply-To: References: <5432E28E-A6B0-4B3B-8932-14B2114ABDD1@continuum.io> Message-ID: On Tue, May 22, 2012 at 5:34 AM, Travis Oliphant wrote: > Just to be clear. ? Are we waiting for the conclusion of the PyArray_Diagonal PR before proceeding with this one? We can talk about this one and everyone's welcome to look at the patch, of course. (In fact it'd be useful if anyone catches any issues now, so I can roll them into the final rebase.) But I'll rebase it again after the PyArray_diagonal thing has been sorted to sort out conflicts, and also fix some docs that I missed, so I don't want to create an actual PR yet. -- Nathaniel > On May 20, 2012, at 1:06 PM, Nathaniel Smith wrote: > >> On Sun, May 20, 2012 at 6:59 PM, Nathaniel Smith wrote: >>>> I have not reviewed it in detail, but in general I would be very supportive of your plan to commit this to master, make a 1.7 release (without the ReduceWrapper) function and then work on the masked array / ndarray separation plan for 1.8 >>>> >>>> Of course, first I would want to hear from Mark, to hear his comments about what was removed. >>> >>> Definitely. I'm pretty sure I didn't accidentally sweep up anything >>> else in my net besides what it says in the commit messages (simply >>> because it's hard to do that when all you're doing is grepping for >>> HASMASKNA and friends), but he knows this code better than I do. >> >> Also on that note, if someone can merge the PyArray_Diagonal PR then I >> can sort out the conflicts and then make a PR for this, to make review >> easier... >> >> - N >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From numpy-discussion at maubp.freeserve.co.uk Tue May 22 05:50:38 2012 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Tue, 22 May 2012 10:50:38 +0100 Subject: [Numpy-discussion] un-silencing Numpy's deprecation warnings In-Reply-To: References: Message-ID: On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith wrote: > So starting in Python 2.7 and 3.2, the Python developers have made > DeprecationWarnings invisible by default: > ?http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x > ?http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html > ?http://bugs.python.org/issue7319 > The only way to see them is to explicitly request them by running > Python with -Wd. > > The logic seems to be that between the end-of-development for 2.7 and > the moratorium on 3.2 changes, there were a *lot* of added > deprecations that were annoying people, and deprecations in the Python > stdlib mean "this code is probably sub-optimal but it will still > continue to work indefinitely". So they consider that deprecation > warnings are like a lint tool for conscientious developers who > remember to test their code with -Wd, but not something to bother > users with. > > In Numpy, the majority of our users are actually (relatively > unsophisticated) developers, and we don't plan to support deprecated > features indefinitely. Our deprecations seem to better match what > Python calls a "FutureWarning": "warnings about constructs that will > change semantically in the future." > ?http://docs.python.org/library/warnings.html#warning-categories > FutureWarning is displayed by default, and available in all versions of Python. > > So maybe we should change all our DeprecationWarnings into > FutureWarnings (or at least the ones that we actually plan to follow > through on). Thoughts? > > - N We had the same discussion for Biopython two years ago, and introduced our own warning class to avoid our deprecations being silent (and thus almost pointless). It is just a subclass of Warning (originally we used a subclass of UserWarning). From robert.kern at gmail.com Tue May 22 06:06:50 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 22 May 2012 11:06:50 +0100 Subject: [Numpy-discussion] un-silencing Numpy's deprecation warnings In-Reply-To: References: Message-ID: On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith wrote: > So starting in Python 2.7 and 3.2, the Python developers have made > DeprecationWarnings invisible by default: > ?http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x > ?http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html > ?http://bugs.python.org/issue7319 > The only way to see them is to explicitly request them by running > Python with -Wd. > > The logic seems to be that between the end-of-development for 2.7 and > the moratorium on 3.2 changes, there were a *lot* of added > deprecations that were annoying people, and deprecations in the Python > stdlib mean "this code is probably sub-optimal but it will still > continue to work indefinitely". That's not quite it, I think, since this change was also made in Python 3.2 and will remain for all future versions of Python. DeprecationWarning *is* used for things that will definitely be going away, not just things that are no longer recommended but will continue to live. Note that the 3.2 moratorium was for changes to the language proper. The point was to encourage stdlib development, including the removal of deprecated code. It was not a moratorium on removing deprecated things. The silencing discussion just came up first in a discussion on the moratorium. The main problem they were running into was that the people who saw these warnings the most were not people directly using the deprecated features; they were users of packages written by third parties that used the deprecated features. Those people can't do anything to fix the problem, and many of them think that something is broken when they see the warning (I don't know why people do this, but they do). This problem is exacerbated by the standard library's position as a standard library. It's at the base of everyone's stack so these indirect effects are quite frequent, quite possibly the majority case. Users would use a newer version of Python library than the third party developer tested on and see these errors instead of the developer. I think this concern is fairly general and applies to numpy nearly as much as it does the standard library. It is at the bottom of many people's stacks. Someone calling matplotlib.pyplot.plot() should not see a DeprecationWarning from numpy. > So they consider that deprecation > warnings are like a lint tool for conscientious developers who > remember to test their code with -Wd, but not something to bother > users with. > > In Numpy, the majority of our users are actually (relatively > unsophisticated) developers, Whether they sometimes wear a developer hat or not isn't the relevant distinction. The question to ask is, "Are they the ones writing the code that directly uses the deprecated features?" > and we don't plan to support deprecated > features indefinitely. Again, this is not relevant. The silencing of DeprecationWarnings was not driven by this. > Our deprecations seem to better match what > Python calls a "FutureWarning": "warnings about constructs that will > change semantically in the future." > ?http://docs.python.org/library/warnings.html#warning-categories > FutureWarning is displayed by default, and available in all versions of Python. > > So maybe we should change all our DeprecationWarnings into > FutureWarnings (or at least the ones that we actually plan to follow > through on). Thoughts? Using FutureWarning for deprecated functions (i.e. functions that will disappear in future releases) is an abuse of the semantics. FutureWarning is for things like the numpy.histogram() changes from a few years ago: changes in default arguments that will change the semantics of a given function call. Some of our DeprecationWarnings possibly should be FutureWarnings, but most shouldn't I don't think. I can see a case being made for using a custom non-silenced exception for some cases that really probably show up mostly in true end-user scenarios, e.g. genfromtxt(). But there are many other cases where we should continue to use DeprecationWarning, e.g. _array2string(). But on the whole, I would just leave the DeprecationWarnings as they are. -- Robert Kern From d.s.seljebotn at astro.uio.no Tue May 22 06:14:56 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 22 May 2012 12:14:56 +0200 Subject: [Numpy-discussion] un-silencing Numpy's deprecation warnings In-Reply-To: References: Message-ID: <4FBB6720.1000508@astro.uio.no> On 05/22/2012 12:06 PM, Robert Kern wrote: > On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith wrote: >> So starting in Python 2.7 and 3.2, the Python developers have made >> DeprecationWarnings invisible by default: >> http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x >> http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html >> http://bugs.python.org/issue7319 >> The only way to see them is to explicitly request them by running >> Python with -Wd. >> >> The logic seems to be that between the end-of-development for 2.7 and >> the moratorium on 3.2 changes, there were a *lot* of added >> deprecations that were annoying people, and deprecations in the Python >> stdlib mean "this code is probably sub-optimal but it will still >> continue to work indefinitely". > > That's not quite it, I think, since this change was also made in > Python 3.2 and will remain for all future versions of Python. > DeprecationWarning *is* used for things that will definitely be going > away, not just things that are no longer recommended but will continue > to live. Note that the 3.2 moratorium was for changes to the language > proper. The point was to encourage stdlib development, including the > removal of deprecated code. It was not a moratorium on removing > deprecated things. The silencing discussion just came up first in a > discussion on the moratorium. > > The main problem they were running into was that the people who saw > these warnings the most were not people directly using the deprecated > features; they were users of packages written by third parties that > used the deprecated features. Those people can't do anything to fix > the problem, and many of them think that something is broken when they > see the warning (I don't know why people do this, but they do). This > problem is exacerbated by the standard library's position as a > standard library. It's at the base of everyone's stack so these > indirect effects are quite frequent, quite possibly the majority case. > Users would use a newer version of Python library than the third party > developer tested on and see these errors instead of the developer. > > I think this concern is fairly general and applies to numpy nearly as > much as it does the standard library. It is at the bottom of many > people's stacks. Someone calling matplotlib.pyplot.plot() should not > see a DeprecationWarning from numpy. > >> So they consider that deprecation >> warnings are like a lint tool for conscientious developers who >> remember to test their code with -Wd, but not something to bother >> users with. >> >> In Numpy, the majority of our users are actually (relatively >> unsophisticated) developers, > > Whether they sometimes wear a developer hat or not isn't the relevant > distinction. The question to ask is, "Are they the ones writing the > code that directly uses the deprecated features?" > >> and we don't plan to support deprecated >> features indefinitely. > > Again, this is not relevant. The silencing of DeprecationWarnings was > not driven by this. > >> Our deprecations seem to better match what >> Python calls a "FutureWarning": "warnings about constructs that will >> change semantically in the future." >> http://docs.python.org/library/warnings.html#warning-categories >> FutureWarning is displayed by default, and available in all versions of Python. >> >> So maybe we should change all our DeprecationWarnings into >> FutureWarnings (or at least the ones that we actually plan to follow >> through on). Thoughts? > > Using FutureWarning for deprecated functions (i.e. functions that will > disappear in future releases) is an abuse of the semantics. > FutureWarning is for things like the numpy.histogram() changes from a > few years ago: changes in default arguments that will change the > semantics of a given function call. Some of our DeprecationWarnings > possibly should be FutureWarnings, but most shouldn't I don't think. I guess the diagonal() change would at least be a FutureWarning then? (When you write to the result?) Dag > > I can see a case being made for using a custom non-silenced exception > for some cases that really probably show up mostly in true end-user > scenarios, e.g. genfromtxt(). But there are many other cases where we > should continue to use DeprecationWarning, e.g. _array2string(). But > on the whole, I would just leave the DeprecationWarnings as they are. > From robert.kern at gmail.com Tue May 22 06:39:44 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 22 May 2012 11:39:44 +0100 Subject: [Numpy-discussion] un-silencing Numpy's deprecation warnings In-Reply-To: <4FBB6720.1000508@astro.uio.no> References: <4FBB6720.1000508@astro.uio.no> Message-ID: On Tue, May 22, 2012 at 11:14 AM, Dag Sverre Seljebotn wrote: > On 05/22/2012 12:06 PM, Robert Kern wrote: >> On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith ?wrote: >>> So maybe we should change all our DeprecationWarnings into >>> FutureWarnings (or at least the ones that we actually plan to follow >>> through on). Thoughts? >> >> Using FutureWarning for deprecated functions (i.e. functions that will >> disappear in future releases) is an abuse of the semantics. >> FutureWarning is for things like the numpy.histogram() changes from a >> few years ago: changes in default arguments that will change the >> semantics of a given function call. Some of our DeprecationWarnings >> possibly should be FutureWarnings, but most shouldn't I don't think. > > I guess the diagonal() change would at least be a FutureWarning then? > (When you write to the result?) Sure. -- Robert Kern From chaoyuejoy at gmail.com Tue May 22 09:33:21 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 22 May 2012 15:33:21 +0200 Subject: [Numpy-discussion] assign a float number to a member of integer array always return integer Message-ID: Dear all, Just in case some one didn't know this. Assign a float number to an integer array element will always return integer. In [4]: a=np.arange(2,11,2) In [5]: a Out[5]: array([ 2, 4, 6, 8, 10]) In [6]: a[1]=4.5 In [7]: a Out[7]: array([ 2, 4, 6, 8, 10]) so I would always do this if I expected a transfer from integer to float? In [18]: b=a.astype(float) In [19]: b Out[19]: array([ 2., 4., 6., 8., 10.]) In [20]: b[1]=4.5 In [21]: b Out[21]: array([ 2. , 4.5, 6. , 8. , 10. ]) thanks et cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldcroft at head.cfa.harvard.edu Tue May 22 09:39:41 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Tue, 22 May 2012 09:39:41 -0400 Subject: [Numpy-discussion] Problem with str.format() and np.recarray Message-ID: I came across this problem which appears to be new in numpy 1.6.2 (vs. 1.6.1): In [17]: a = np.array([(1, )], dtype=[('a', 'i4')]) In [18]: ra = a.view(np.recarray) In [19]: '{}'.format(ra[0]) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /data/baffin/tom/git/eng_archive/ in () ----> 1 '{}'.format(ra[0]) RuntimeError: maximum recursion depth exceeded while calling a Python object In [20]: str(ra[0]) Out[20]: '(1,)' In [21]: ra[0] Out[21]: (1,) There are obvious workarounds but it seems something is not right. I'm running Python 2.7 on linux x86_64. Cheers, Tom From njs at pobox.com Tue May 22 09:45:59 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 22 May 2012 14:45:59 +0100 Subject: [Numpy-discussion] un-silencing Numpy's deprecation warnings In-Reply-To: References: Message-ID: On Tue, May 22, 2012 at 11:06 AM, Robert Kern wrote: > On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith wrote: >> So starting in Python 2.7 and 3.2, the Python developers have made >> DeprecationWarnings invisible by default: >> ?http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x >> ?http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html >> ?http://bugs.python.org/issue7319 >> The only way to see them is to explicitly request them by running >> Python with -Wd. >> >> The logic seems to be that between the end-of-development for 2.7 and >> the moratorium on 3.2 changes, there were a *lot* of added >> deprecations that were annoying people, and deprecations in the Python >> stdlib mean "this code is probably sub-optimal but it will still >> continue to work indefinitely". > > That's not quite it, I think, since this change was also made in > Python 3.2 and will remain for all future versions of Python. > DeprecationWarning *is* used for things that will definitely be going > away, not just things that are no longer recommended but will continue > to live. Note that the 3.2 moratorium was for changes to the language > proper. The point was to encourage stdlib development, including the > removal of deprecated code. It was not a moratorium on removing > deprecated things. The silencing discussion just came up first in a > discussion on the moratorium. > > The main problem they were running into was that the people who saw > these warnings the most were not people directly using the deprecated > features; they were users of packages written by third parties that > used the deprecated features. Those people can't do anything to fix > the problem, and many of them think that something is broken when they > see the warning (I don't know why people do this, but they do). This > problem is exacerbated by the standard library's position as a > standard library. It's at the base of everyone's stack so these > indirect effects are quite frequent, quite possibly the majority case. > Users would use a newer version of Python library than the third party > developer tested on and see these errors instead of the developer. > > I think this concern is fairly general and applies to numpy nearly as > much as it does the standard library. It is at the bottom of many > people's stacks. Someone calling matplotlib.pyplot.plot() should not > see a DeprecationWarning from numpy. Yes, good points -- though I think there is a also real cost/benefit trade-off that depends on the details of how often these warnings are issued, the specific user base, etc. Compared to stdlib, a *much* higher proportion of numpy-using code consists of scripts whose only users are their authors, who didn't think very carefully about error handling, and who will continue to use these scripts for long periods of time (i.e. over multiple releases). So I feel like we should have a higher threshold for making warnings silent by default. OTOH, the distinction you suggest does make sense. I would summarize it as: - If a function or similar will just disappear in a future release, causing obvious failures in any code that depends on it, then DeprecationWarning is fine. People's code will unexpectedly break from time to time, but in safe ways, and anyway downgrading is easy. - Otherwise FutureWarning is preferred Does that sound like a reasonable rule of thumb? -- Nathaniel From robert.kern at gmail.com Tue May 22 09:49:46 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 22 May 2012 14:49:46 +0100 Subject: [Numpy-discussion] un-silencing Numpy's deprecation warnings In-Reply-To: References: Message-ID: On Tue, May 22, 2012 at 2:45 PM, Nathaniel Smith wrote: > On Tue, May 22, 2012 at 11:06 AM, Robert Kern wrote: >> On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith wrote: >>> So starting in Python 2.7 and 3.2, the Python developers have made >>> DeprecationWarnings invisible by default: >>> ?http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x >>> ?http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html >>> ?http://bugs.python.org/issue7319 >>> The only way to see them is to explicitly request them by running >>> Python with -Wd. >>> >>> The logic seems to be that between the end-of-development for 2.7 and >>> the moratorium on 3.2 changes, there were a *lot* of added >>> deprecations that were annoying people, and deprecations in the Python >>> stdlib mean "this code is probably sub-optimal but it will still >>> continue to work indefinitely". >> >> That's not quite it, I think, since this change was also made in >> Python 3.2 and will remain for all future versions of Python. >> DeprecationWarning *is* used for things that will definitely be going >> away, not just things that are no longer recommended but will continue >> to live. Note that the 3.2 moratorium was for changes to the language >> proper. The point was to encourage stdlib development, including the >> removal of deprecated code. It was not a moratorium on removing >> deprecated things. The silencing discussion just came up first in a >> discussion on the moratorium. >> >> The main problem they were running into was that the people who saw >> these warnings the most were not people directly using the deprecated >> features; they were users of packages written by third parties that >> used the deprecated features. Those people can't do anything to fix >> the problem, and many of them think that something is broken when they >> see the warning (I don't know why people do this, but they do). This >> problem is exacerbated by the standard library's position as a >> standard library. It's at the base of everyone's stack so these >> indirect effects are quite frequent, quite possibly the majority case. >> Users would use a newer version of Python library than the third party >> developer tested on and see these errors instead of the developer. >> >> I think this concern is fairly general and applies to numpy nearly as >> much as it does the standard library. It is at the bottom of many >> people's stacks. Someone calling matplotlib.pyplot.plot() should not >> see a DeprecationWarning from numpy. > > Yes, good points -- though I think there is a also real cost/benefit > trade-off that depends on the details of how often these warnings are > issued, the specific user base, etc. > > Compared to stdlib, a *much* higher proportion of numpy-using code > consists of scripts whose only users are their authors, who didn't > think very carefully about error handling, and who will continue to > use these scripts for long periods of time (i.e. over multiple > releases). So I feel like we should have a higher threshold for making > warnings silent by default. > > OTOH, the distinction you suggest does make sense. I would summarize it as: > > ?- If a function or similar will just disappear in a future release, > causing obvious failures in any code that depends on it, then > DeprecationWarning is fine. People's code will unexpectedly break from > time to time, but in safe ways, and anyway downgrading is easy. > - Otherwise FutureWarning is preferred > > Does that sound like a reasonable rule of thumb? Sure. -- Robert Kern From f_magician at mac.com Tue May 22 09:52:59 2012 From: f_magician at mac.com (Magician) Date: Tue, 22 May 2012 22:52:59 +0900 Subject: [Numpy-discussion] Building error with ATLAS Message-ID: Hi all, I'm now trying to build NumPy with ATLAS on CentOS 6.2. I'm going to use them with SciPy. My CentOS is installed as "Software Development Workstation" on my Virtual Machine (VMware Fusion 4, Mac OS 10.7.4). I already installed Python 2.7.3 on /usr/local/python-2.7.3 from sources, and no other tools are installed yet. I tried but failed to build ATLAS, so I got ATLAS by using yum command: > yum install blas-devel lapack-devel atlas-devel Then I tried to build NumPy. But I got these errors: > building 'numpy.core._sort' extension > compiling C sources > C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC > > compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/local/python-2.7.3/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' > gcc: build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.c > gcc -pthread -shared build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.o -L. -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/_sort.so > /usr/bin/ld: cannot find -lpython2.7 > collect2: ld returned 1 exit status > /usr/bin/ld: cannot find -lpython2.7 > collect2: ld returned 1 exit status > error: Command "gcc -pthread -shared build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.o -L. -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/_sort.so" failed with exit status 1 Before I built NumPy, I uncommented and modified site.cfg as below: > [DEFAULT] > library_dirs = /usr/local/python-2.7.3/lib:/usr/lib64:/usr/lib64/atlas > include_dirs = /usr/local/python-2.7.3/include:/usr/include:/usr/include/atlas > > [blas_opt] > libraries = f77blas, cblas, atlas > > [lapack_opt] > libraries = lapack, f77blas, cblas, atlas And "setup.py config" dumped these messages: > Running from numpy source directory.F2PY Version 2 > blas_opt_info: > blas_mkl_info: > libraries mkl,vml,guide not found in /usr/local/python-2.7.3/lib > libraries mkl,vml,guide not found in /usr/lib64 > libraries mkl,vml,guide not found in /usr/lib64/atlas > NOT AVAILABLE > > atlas_blas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/python-2.7.3/lib > Setting PTATLAS=ATLAS > customize GnuFCompiler > Could not locate executable g77 > Could not locate executable f77 > customize IntelFCompiler > Could not locate executable ifort > Could not locate executable ifc > customize LaheyFCompiler > Could not locate executable lf95 > customize PGroupFCompiler > Could not locate executable pgf90 > Could not locate executable pgf77 > customize AbsoftFCompiler > Could not locate executable f90 > customize NAGFCompiler > Found executable /usr/bin/f95 > customize VastFCompiler > customize CompaqFCompiler > Could not locate executable fort > customize IntelItaniumFCompiler > Could not locate executable efort > Could not locate executable efc > customize IntelEM64TFCompiler > customize Gnu95FCompiler > Found executable /usr/bin/gfortran > customize Gnu95FCompiler > customize Gnu95FCompiler using config > compiling '_configtest.c': > > /* This file is generated from numpy/distutils/system_info.py */ > void ATL_buildinfo(void); > int main(void) { > ATL_buildinfo(); > return 0; > } > > C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC > > compile options: '-c' > gcc: _configtest.c > gcc -pthread _configtest.o -L/usr/lib64/atlas -lptf77blas -lptcblas -latlas -o _configtest > ATLAS version 3.8.4 built by mockbuild on Wed Dec 7 18:04:21 GMT 2011: > UNAME : Linux c6b5.bsys.dev.centos.org 2.6.32-44.2.el6.x86_64 #1 SMP Wed Jul 21 12:48:32 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux > INSTFLG : -1 0 -a 1 > ARCHDEFS : -DATL_OS_Linux -DATL_ARCH_PII -DATL_CPUMHZ=2261 -DATL_SSE2 -DATL_SSE1 -DATL_USE64BITS -DATL_GAS_x8664 > F2CDEFS : -DAdd_ -DF77_INTEGER=int -DStringSunStyle > CACHEEDGE: 8388608 > F77 : gfortran, version GNU Fortran (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) > F77FLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -g -Wa,--noexecstack -fPIC -m64 > SMC : gcc, version gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) > SMCFLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -g -Wa,--noexecstack -fPIC -m64 > SKC : gcc, version gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) > SKCFLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -g -Wa,--noexecstack -fPIC -m64 > success! > removing: _configtest.c _configtest.o _configtest > Setting PTATLAS=ATLAS > FOUND: > libraries = ['ptf77blas', 'ptcblas', 'atlas'] > library_dirs = ['/usr/lib64/atlas'] > language = c > define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')] > include_dirs = ['/usr/include'] > > FOUND: > libraries = ['ptf77blas', 'ptcblas', 'atlas'] > library_dirs = ['/usr/lib64/atlas'] > language = c > define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')] > include_dirs = ['/usr/include'] > > lapack_opt_info: > lapack_mkl_info: > mkl_info: > libraries mkl,vml,guide not found in /usr/local/python-2.7.3/lib > libraries mkl,vml,guide not found in /usr/lib64 > libraries mkl,vml,guide not found in /usr/lib64/atlas > NOT AVAILABLE > > NOT AVAILABLE > > atlas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/python-2.7.3/lib > libraries lapack_atlas not found in /usr/local/python-2.7.3/lib > libraries lapack_atlas not found in /usr/lib64/atlas > numpy.distutils.system_info.atlas_threads_info > Setting PTATLAS=ATLAS > Setting PTATLAS=ATLAS > FOUND: > libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] > library_dirs = ['/usr/lib64/atlas'] > language = f77 > define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')] > include_dirs = ['/usr/include'] > > FOUND: > libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] > library_dirs = ['/usr/lib64/atlas'] > language = f77 > define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')] > include_dirs = ['/usr/include'] > > running config When I built NumPy before "yum install", it was successfully finished. I didn't use site.cfg, but I had to export CFLAGS="-L/usr/local/python-2.7.3/lib". How can I build and install NumPy with yum-installed ATLAS? Magician From massimo.dipierro at gmail.com Tue May 22 10:25:25 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Tue, 22 May 2012 09:25:25 -0500 Subject: [Numpy-discussion] question about in-place operations Message-ID: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> hello everybody, first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). One typical operation is this a[i] += const * b[i] What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. a and b have the same shape but are not necessarily uni-dimensional. Massimo From d.s.seljebotn at astro.uio.no Tue May 22 10:32:07 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 22 May 2012 16:32:07 +0200 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> Message-ID: <4FBBA367.8060406@astro.uio.no> On 05/22/2012 04:25 PM, Massimo DiPierro wrote: > hello everybody, > > first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). > > One typical operation is this > > a[i] += const * b[i] > > What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. > a and b have the same shape but are not necessarily uni-dimensional. I don't think NumPy support this; if you can't modify b[i] in-place, I think your only option will be one of numexpr/Theano/Cython. Dag From jniehof at lanl.gov Tue May 22 10:39:53 2012 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Tue, 22 May 2012 08:39:53 -0600 Subject: [Numpy-discussion] un-silencing Numpy's deprecation warnings In-Reply-To: References: Message-ID: <4FBBA539.9010400@lanl.gov> On 05/22/2012 03:50 AM, Peter wrote: > We had the same discussion for Biopython two years ago, and > introduced our own warning class to avoid our deprecations being > silent (and thus almost pointless). It is just a subclass of Warning > (originally we used a subclass of UserWarning). For SpacePy we took a similar but slightly different approach; this is in the top-level __init__: if config['enable_deprecation_warning']: warnings.filterwarnings('default', '', DeprecationWarning, '^spacepy', 0, False) enable_deprecation_warning is True by default, but can be changed in the user's config file. This keeps everything as DeprecationWarning but only fiddles with the filter for spacepy (and it's set to default, not always.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From massimo.dipierro at gmail.com Tue May 22 10:47:35 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Tue, 22 May 2012 09:47:35 -0500 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: <4FBBA367.8060406@astro.uio.no> References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> Message-ID: Thank you. I will look into numexpr. Anyway, I do not need arbitrary expressions. If there were something like numpy.add_scaled(a,scale,b) with support for scale in int, float, complex, this would be sufficient for me. Massimo On May 22, 2012, at 9:32 AM, Dag Sverre Seljebotn wrote: > On 05/22/2012 04:25 PM, Massimo DiPierro wrote: >> hello everybody, >> >> first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). >> >> One typical operation is this >> >> a[i] += const * b[i] >> >> What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. >> a and b have the same shape but are not necessarily uni-dimensional. > > I don't think NumPy support this; if you can't modify b[i] in-place, I > think your only option will be one of numexpr/Theano/Cython. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From tim at cerazone.net Tue May 22 10:51:22 2012 From: tim at cerazone.net (Tim Cera) Date: Tue, 22 May 2012 10:51:22 -0400 Subject: [Numpy-discussion] Internationalization of numpy/scipy docstrings... In-Reply-To: References: Message-ID: > > >> Docstrings are not stored in .rst files but in the numpy sources, so > there are some non-trivial technical and workflow details missing here. But > besides that, I think translating everything (even into a single language) > is a massive amount of work, and it's not at all clear if there's enough > people willing to help out with this. So I'd think it would be better to > start with just the high-level docs (numpy user guide, scipy tutorial) to > see how it goes. > I understand that this is non-trivial, for me anyway, because I can't figure out how to make my way around numpydoc, and documentation editor code (not quite true, as Pauli accepted a couple of my pull requests, but I definitely can't make it dance). This is why I asked for interest and help on the mailing list. I think for the people that worked on the documentation editor, or know Django, or are cleverer than I, the required changes to the documentation editor might by mid-trivial. That is my hope anyway. Would probably have the high-level docs separate from the docstring processing anyway since the high-level docs are already in a sphinx source directory. So I agree that the high-level docs would be the best place to start and in-fact that is what I was working with and found the problem with the sphinx gettext builder mentioned in the original post. I do want to defend and clarify the docstring processing though. Docstrings, in the code, will always be English. The documentation editor is the fulcrum. The documentation editor will work with the in the code docstrings *exactly *as it does now. The documentation editor would be changed so that when it writes the ReST formatted docstring back into the code, it *also *writes a *.rst file to a separate sphinx source directory. These *.rst files would not be part of the numpy source code directory, but an interim file for the documentation editor and sphinx to extract strings to make *.po files, pootle + hordes of translators :-) gives *.pot files, *.pot -> *.mo -> *.rst (translated). The English *.rst, *.po, *.pot, *.mo files are all interim products behind the scenes. The translated *.rst files would NOT be part of the numpy source code, but packaged separately. I must admit that I did hope that there would be more interest. Maybe I should have figured out how to put 'maskna' or '1.7' in the subject? In defense of there not be much interest is that the people who would possibly benefit, aren't reading English mailing lists. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Tue May 22 10:54:40 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Tue, 22 May 2012 09:54:40 -0500 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: <4FBBA367.8060406@astro.uio.no> References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> Message-ID: For now I will be doing this: import numpy import time a=numpy.zeros(2000000) b=numpy.zeros(2000000) c=1.0 # naive solution t0 = time.time() for i in xrange(len(a)): a[i] += c*b[i] print time.time()-t0 # possible solution n=100000 t0 = time.time() for i in xrange(0,len(a),n): a[i:i+n] += c*b[i:i+n] print time.time()-t0 the second "possible" solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. Any reason not to do it? On May 22, 2012, at 9:32 AM, Dag Sverre Seljebotn wrote: > On 05/22/2012 04:25 PM, Massimo DiPierro wrote: >> hello everybody, >> >> first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). >> >> One typical operation is this >> >> a[i] += const * b[i] >> >> What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. >> a and b have the same shape but are not necessarily uni-dimensional. > > I don't think NumPy support this; if you can't modify b[i] in-place, I > think your only option will be one of numexpr/Theano/Cython. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Tue May 22 10:59:31 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 22 May 2012 15:59:31 +0100 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> Message-ID: On Tue, May 22, 2012 at 3:47 PM, Massimo DiPierro wrote: > Thank you. I will look into numexpr. > > Anyway, I do not need arbitrary expressions. If there were something like > > numpy.add_scaled(a,scale,b) > > with support for scale in int, float, complex, this would be sufficient for me. BLAS has the xAXPY functions, which will do this for float and complex. import numpy as np from scipy.linalg import fblas def add_scaled_inplace(a, scale, b): if np.issubdtype(a.dtype, complex): fblas.zaxpy(b, a, a=scale) else: fblas.daxpy(b, a, a=scale) -- Robert Kern From massimo.dipierro at gmail.com Tue May 22 11:06:31 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Tue, 22 May 2012 10:06:31 -0500 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> Message-ID: <493333C0-AB3F-46F5-BD8B-A2A7FBF32D01@gmail.com> Thank you this does it. On May 22, 2012, at 9:59 AM, Robert Kern wrote: > On Tue, May 22, 2012 at 3:47 PM, Massimo DiPierro > wrote: >> Thank you. I will look into numexpr. >> >> Anyway, I do not need arbitrary expressions. If there were something like >> >> numpy.add_scaled(a,scale,b) >> >> with support for scale in int, float, complex, this would be sufficient for me. > > BLAS has the xAXPY functions, which will do this for float and complex. > > import numpy as np > from scipy.linalg import fblas > > def add_scaled_inplace(a, scale, b): > if np.issubdtype(a.dtype, complex): > fblas.zaxpy(b, a, a=scale) > else: > fblas.daxpy(b, a, a=scale) > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From massimo.dipierro at gmail.com Tue May 22 11:09:18 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Tue, 22 May 2012 10:09:18 -0500 Subject: [Numpy-discussion] how to avoid re-shaping In-Reply-To: References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> Message-ID: One more questions (since this list is very useful. ;-) If I have a numpy array of arbitrary shape, is there are a way to sequentially loop over its elements without reshaping it into a 1D array? I am trying to simplify this: n=product(data.shape) oldshape = data.shape newshape = (n,) data.reshape(newshape) for i in xrange(n): do_something_with(data[i]) data.reshape(oldshape) Massimo From robert.kern at gmail.com Tue May 22 11:12:32 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 22 May 2012 16:12:32 +0100 Subject: [Numpy-discussion] how to avoid re-shaping In-Reply-To: References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> Message-ID: On Tue, May 22, 2012 at 4:09 PM, Massimo DiPierro wrote: > One more questions (since this list is very useful. ;-) > > If I have a numpy array of arbitrary shape, is there are a way to sequentially loop over its elements without reshaping it into a 1D array? > > I am trying to simplify this: > > n=product(data.shape) > oldshape = data.shape > newshape = (n,) > data.reshape(newshape) Note that the .reshape() method does not work in-place. It just returns a new ndarray object viewing the same data using the different shape. That said, just iterate over data.flat, if you must iterate manually. -- Robert Kern From massimo.dipierro at gmail.com Tue May 22 11:21:26 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Tue, 22 May 2012 10:21:26 -0500 Subject: [Numpy-discussion] how to avoid re-shaping In-Reply-To: References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> Message-ID: On May 22, 2012, at 10:12 AM, Robert Kern wrote: > On Tue, May 22, 2012 at 4:09 PM, Massimo DiPierro > wrote: >> One more questions (since this list is very useful. ;-) >> >> If I have a numpy array of arbitrary shape, is there are a way to sequentially loop over its elements without reshaping it into a 1D array? >> >> I am trying to simplify this: >> >> n=product(data.shape) >> oldshape = data.shape >> newshape = (n,) >> data.reshape(newshape) > > Note that the .reshape() method does not work in-place. It just > returns a new ndarray object viewing the same data using the different > shape. > > That said, just iterate over data.flat, if you must iterate manually. That's what I was looking for. Thank you. > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Tue May 22 12:18:45 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 22 May 2012 09:18:45 -0700 Subject: [Numpy-discussion] assign a float number to a member of integer array always return integer In-Reply-To: References: Message-ID: On Tue, May 22, 2012 at 6:33 AM, Chao YUE wrote: > Just in case some one didn't know this. Assign a float number to an integer > array element will always return integer. right -- numpy arrays are typed -- that's one of the points of them -- you wouldn't want the entire array up-cast with a single assignment -- particularly since there are only python literals for a subset of the numpy types. > so I would always do this if I expected a transfer from integer to float? > In [18]: b=a.astype(float) yes -- but that's an odd way of thinking about -- what you want to do is think about what type you need your array to be before you create it, then create it the way you need it: In [87]: np.arange(5, dtype=np.float) Out[87]: array([ 0., 1., 2., 3., 4.]) or better: In [91]: np.linspace(0,5,6) Out[91]: array([ 0., 1., 2., 3., 4., 5.]) note that most (all?) numpy array constructors take a "dtype" argument. -Chris > In [19]: b > Out[19]: array([? 2.,?? 4.,?? 6.,?? 8.,? 10.]) > > In [20]: b[1]=4.5 > > In [21]: b > Out[21]: array([? 2. ,?? 4.5,?? 6. ,?? 8. ,? 10. ]) > > thanks et cheers, > > Chao > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From njs at pobox.com Tue May 22 12:20:48 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 22 May 2012 17:20:48 +0100 Subject: [Numpy-discussion] subclassing ndarray subtleties?? In-Reply-To: References: Message-ID: On Mon, May 21, 2012 at 6:47 PM, Tom Aldcroft wrote: > Over on the scipy-user mailing list there was a question about > subclassing ndarray and I was interested to see two responses that > seemed to imply that subclassing should be avoided. > > >From Dag and Nathaniel, respectively: > > "Subclassing ndarray is a very tricky business -- I did it once and > regretted having done it for years, because there's so much you can't > do etc.. You're almost certainly better off with embedding an array as > an attribute, and then forward properties etc. to it." > > "Yes, it's almost always the wrong thing..." > > So my question is whether there are subtleties or issues that are not > covered in the standard NumPy documents on subclassing ndarray. ?What > are the "things you can't do etc"? ?I'm working on a project that > relies heavily on an ndarray subclass which just adds a few attributes > and slightly tweaks __getitem__. ?It seems fine and I really like that > the class is an ndarray with all the built-in methods already there. > Am I missing anything? > > >From the scipy thread I did already learn that one should also > override __getslice__ in addition to __getitem__ to be safe. I don't know of anything that the docs are lacking in particular. It's just that subclassing in general is basically a special form of monkey-patching: you have this ecosystem of cooperating methods, and then you're inserting some arbitrary changes in the middle of it. Making it all work in general requires that you carefully think through how all the different pieces of the ndarray API interact, and the ndarray API is very large and complicated. The __getslice__ thing is one example of this. For another: does your __getitem__ properly handle *all* the cases that regular ndarray.__getitem__ handles? (I'm not sure anyone actually knows what this complete list is, there are a lot of potential corner cases.) What happens if one of your objects is passed to third-party code that uses __getitem__? What happens if your array is accidentally stripped of its magic properties by passing through np.asarray() at the top of some function? Have you thought about how your special attributes are affected by, say, swapaxes? Have you applied your tweaks to item() and setitem()? I'm just guessing randomly here of course, since I have no idea what you've done. And I've subclassed ndarray myself at least three times, for reasons that seemed good enough at the time, so I'm not saying it's never doable. It's just that there are tons of these tiny little details, any one of which can trip you up, and that means that people tend to dive in and then discover the pitfalls later. - N From josef.pktd at gmail.com Tue May 22 12:46:44 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 22 May 2012 12:46:44 -0400 Subject: [Numpy-discussion] Internationalization of numpy/scipy docstrings... In-Reply-To: References: Message-ID: On Tue, May 22, 2012 at 10:51 AM, Tim Cera wrote: >>> >> Docstrings are not stored in .rst files but in the numpy sources, so there >> are some?non-trivial technical and workflow details missing here. But >> besides that, I think translating everything (even into a single language) >> is a massive amount of work, and it's not at all clear if there's enough >> people willing to help out with this. So I'd think it would be better to >> start with just the high-level docs (numpy user guide, scipy tutorial) to >> see how it goes. > > > I understand that this is non-trivial, for me anyway, because I can't figure > out how to make my way around numpydoc, and documentation editor code (not > quite true, as Pauli accepted a couple of my pull requests,?but > I?definitely?can't make it dance). ?This is why I asked for interest and > help on the mailing list. ?I think for the people that worked on the > documentation editor, or know Django, or are cleverer than I, the required > changes to the documentation editor might by mid-trivial. ?That is my hope > anyway. > > Would probably have the high-level docs separate from the docstring > processing anyway since the high-level docs are already in a sphinx source > directory. ?So I agree that the high-level docs would be the best place to > start and in-fact that is what I was working with and found the problem with > the sphinx gettext builder mentioned in the original post. > > I do want to defend and clarify the docstring processing though. > ?Docstrings, in the code, will always be English. The documentation editor > is the fulcrum. ?The documentation editor will work with the in the code > docstrings exactly as it does now. ?The documentation editor would be > changed so that when it writes the ReST formatted docstring back into the > code, it also writes a *.rst file to a separate sphinx source directory. > ?These *.rst files would not be part of the numpy source code directory, but > an interim file for the documentation editor and sphinx to extract strings > to make *.po files, pootle + hordes of translators :-) gives *.pot files, > *.pot -> *.mo -> *.rst (translated). ?The English *.rst, *.po, *.pot, *.mo > files are all interim products behind the scenes. ?The translated *.rst > files would NOT be part of the numpy source code, but packaged separately. > > I must admit that I did hope that there would be more interest. ?Maybe I > should have figured out how to put 'maskna' or '1.7' in the subject? > > In defense of there not be much interest is that the people who would > possibly benefit, aren't reading English mailing lists. One advantage of getting this done would be that other packages could follow the same approach. Just as numpy.testing and numpy's doc standard has spread to related packages, being able to generate translations might be even more interesting to downstream packages. There the fraction of end users, that are not used to working in english anyway, might be larger than for numpy itself. The numpy mailing list may be to narrow to catch the attention of developers with enough interest and expertise in the area. Josef > > Kindest regards, > Tim > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Tue May 22 14:47:11 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 22 May 2012 20:47:11 +0200 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> Message-ID: <4FBBDF2F.6010803@astro.uio.no> On 05/22/2012 04:54 PM, Massimo DiPierro wrote: > For now I will be doing this: > > import numpy > import time > > a=numpy.zeros(2000000) > b=numpy.zeros(2000000) > c=1.0 > > # naive solution > t0 = time.time() > for i in xrange(len(a)): > a[i] += c*b[i] > print time.time()-t0 > > # possible solution > n=100000 > t0 = time.time() > for i in xrange(0,len(a),n): > a[i:i+n] += c*b[i:i+n] > print time.time()-t0 > > the second "possible" solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. > > Any reason not to do it? No, this is perfectly fine, you just manually did what numexpr does. On 05/22/2012 04:47 PM, Massimo DiPierro wrote: > Thank you. I will look into numexpr. > > Anyway, I do not need arbitrary expressions. If there were something like > > numpy.add_scaled(a,scale,b) > > with support for scale in int, float, complex, this would be sufficient for me. But of course, few needs *arbitrary* expressions -- it's just that the ones they want are not already compiled. It's the last 5% functionality that's different for everybody... (But the example you mention could make a nice ufunc; so an alternative for you would be to look at the C implementation of np.add and try to submit a pull request for numpy.add_scaled) Dag From francesc at continuum.io Tue May 22 14:57:14 2012 From: francesc at continuum.io (Francesc Alted) Date: Tue, 22 May 2012 20:57:14 +0200 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: <4FBBDF2F.6010803@astro.uio.no> References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> <4FBBDF2F.6010803@astro.uio.no> Message-ID: <4FBBE18A.6060300@continuum.io> On 5/22/12 8:47 PM, Dag Sverre Seljebotn wrote: > On 05/22/2012 04:54 PM, Massimo DiPierro wrote: >> For now I will be doing this: >> >> import numpy >> import time >> >> a=numpy.zeros(2000000) >> b=numpy.zeros(2000000) >> c=1.0 >> >> # naive solution >> t0 = time.time() >> for i in xrange(len(a)): >> a[i] += c*b[i] >> print time.time()-t0 >> >> # possible solution >> n=100000 >> t0 = time.time() >> for i in xrange(0,len(a),n): >> a[i:i+n] += c*b[i:i+n] >> print time.time()-t0 >> >> the second "possible" solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. >> >> Any reason not to do it? > No, this is perfectly fine, you just manually did what numexpr does. Yeah. You basically re-discovered the blocking technique. For a more general example on how to apply the blocking technique with NumPy see the section "CPU vs Memory Benchmark" in: https://python.g-node.org/python-autumnschool-2010/materials/starving_cpus Of course numexpr has less overhead (and can use multiple cores) than using plain NumPy. -- Francesc Alted From massimo.dipierro at gmail.com Tue May 22 15:07:10 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Tue, 22 May 2012 14:07:10 -0500 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: <4FBBDF2F.6010803@astro.uio.no> References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> <4FBBDF2F.6010803@astro.uio.no> Message-ID: <7535DF93-6D03-472E-9186-6A2A14E1A1C6@gmail.com> Thank you Dag, I will look into it. Is there any documentation about ufunc? Is this the file core/src/umath/ufunc_object.c Massimo On May 22, 2012, at 1:47 PM, Dag Sverre Seljebotn wrote: > On 05/22/2012 04:54 PM, Massimo DiPierro wrote: >> For now I will be doing this: >> >> import numpy >> import time >> >> a=numpy.zeros(2000000) >> b=numpy.zeros(2000000) >> c=1.0 >> >> # naive solution >> t0 = time.time() >> for i in xrange(len(a)): >> a[i] += c*b[i] >> print time.time()-t0 >> >> # possible solution >> n=100000 >> t0 = time.time() >> for i in xrange(0,len(a),n): >> a[i:i+n] += c*b[i:i+n] >> print time.time()-t0 >> >> the second "possible" solution appears 1000x faster then the former >> in my tests and uses little extra memory. It is only 2x slower than >> b*=c. >> >> Any reason not to do it? > > No, this is perfectly fine, you just manually did what numexpr does. > > > On 05/22/2012 04:47 PM, Massimo DiPierro wrote: >> Thank you. I will look into numexpr. >> >> Anyway, I do not need arbitrary expressions. If there were >> something like >> >> numpy.add_scaled(a,scale,b) >> >> with support for scale in int, float, complex, this would be > sufficient for me. > > But of course, few needs *arbitrary* expressions -- it's just that the > ones they want are not already compiled. > > It's the last 5% functionality that's different for everybody... > > (But the example you mention could make a nice ufunc; so an > alternative > for you would be to look at the C implementation of np.add and try to > submit a pull request for numpy.add_scaled) > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From massimo.dipierro at gmail.com Tue May 22 15:08:19 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Tue, 22 May 2012 14:08:19 -0500 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: <4FBBE18A.6060300@continuum.io> References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> <4FBBDF2F.6010803@astro.uio.no> <4FBBE18A.6060300@continuum.io> Message-ID: This problem is linear so probably Ram IO bound. I do not think I would benefit much for multiple cores. But I will give it a try. In the short term this is good enough for me. On May 22, 2012, at 1:57 PM, Francesc Alted wrote: > On 5/22/12 8:47 PM, Dag Sverre Seljebotn wrote: >> On 05/22/2012 04:54 PM, Massimo DiPierro wrote: >>> For now I will be doing this: >>> >>> import numpy >>> import time >>> >>> a=numpy.zeros(2000000) >>> b=numpy.zeros(2000000) >>> c=1.0 >>> >>> # naive solution >>> t0 = time.time() >>> for i in xrange(len(a)): >>> a[i] += c*b[i] >>> print time.time()-t0 >>> >>> # possible solution >>> n=100000 >>> t0 = time.time() >>> for i in xrange(0,len(a),n): >>> a[i:i+n] += c*b[i:i+n] >>> print time.time()-t0 >>> >>> the second "possible" solution appears 1000x faster then the >>> former in my tests and uses little extra memory. It is only 2x >>> slower than b*=c. >>> >>> Any reason not to do it? >> No, this is perfectly fine, you just manually did what numexpr does. > > Yeah. You basically re-discovered the blocking technique. For a more > general example on how to apply the blocking technique with NumPy see > the section "CPU vs Memory Benchmark" in: > > https://python.g-node.org/python-autumnschool-2010/materials/starving_cpus > > Of course numexpr has less overhead (and can use multiple cores) than > using plain NumPy. > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From dg.gmane at thesamovar.net Tue May 22 16:07:10 2012 From: dg.gmane at thesamovar.net (Dan Goodman) Date: Tue, 22 May 2012 22:07:10 +0200 Subject: [Numpy-discussion] subclassing ndarray subtleties?? In-Reply-To: References: Message-ID: <4FBBF1EE.1070004@thesamovar.net> On 22/05/2012 18:20, Nathaniel Smith wrote: > I don't know of anything that the docs are lacking in particular. It's > just that subclassing in general is basically a special form of > monkey-patching: you have this ecosystem of cooperating methods, and > then you're inserting some arbitrary changes in the middle of it. > Making it all work in general requires that you carefully think > through how all the different pieces of the ndarray API interact, and > the ndarray API is very large and complicated. The __getslice__ thing > is one example of this. For another: does your __getitem__ properly > handle *all* the cases that regular ndarray.__getitem__ handles? (I'm > not sure anyone actually knows what this complete list is, there are a > lot of potential corner cases.) What happens if one of your objects is > passed to third-party code that uses __getitem__? What happens if your > array is accidentally stripped of its magic properties by passing > through np.asarray() at the top of some function? Have you thought > about how your special attributes are affected by, say, swapaxes? Have > you applied your tweaks to item() and setitem()? > > I'm just guessing randomly here of course, since I have no idea what > you've done. And I've subclassed ndarray myself at least three times, > for reasons that seemed good enough at the time, so I'm not saying > it's never doable. It's just that there are tons of these tiny little > details, any one of which can trip you up, and that means that people > tend to dive in and then discover the pitfalls later. I've also used subclasses of ndarray, and have stumbled across most (but not all) of the problems you mentioned above. In my case, my code has gradually evolved over a few years as I've become aware of each of these problems. I think it would be useful to have an example of a completely 'correctly' subclassed ndarray that handles all of these issues that people could use as a template when they want to subclass ndarray. I appreciate, though, that there's no-one who particularly wants to do this! :) I'd offer my code as an example, but Nathaniel's comment above shows that there's many things mine doesn't handle properly. Dan From nouiz at nouiz.org Tue May 22 16:07:50 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 22 May 2012 16:07:50 -0400 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive In-Reply-To: References: <4FB58185.7070500@astro.uio.no> <1337342164.9661.24.camel@farnsworth> <5d3a71d1-0f95-422f-a18c-58ff574052c4@email.android.com> <1337353201.11994.8.camel@farnsworth> <4FB68E17.2010904@astro.uio.no> Message-ID: Hi, The example with numpy array for small array, the speed problem is probably because NumPy have not been speed optimized for low overhead. For example, each c function should check first if the input is a NumPy array, if not jump to a function to make one. For example, currently in the c function(PyArray_Multiply?) that got called by dot(), a c function call is made to check if the array is a NumPy array. This is an extra overhead for the expected most frequent expected behavior that the input is a NumPy array. I'm pretty sure this happen at many place. In this particular function, there is many other function call before calling blas just for the simple case of vector x vector, vector x matrix or matrix x matrix dot product. But this is probably for another thread if people want to discuss it more. Also, I didn't verify how frequently we could lower the overhead as we don't need it. So it could be just a few function that need those type of optimization. For the comparison with the multiple type of array on the GPU, I think the first reason is that people worked isolated and that the only implemented the subset of the numpy ndarray they needed. As different project/groups need different part, reusing other people work was not trivial. Otherwise, I see the problem, but I don't know what to say about it as I didn't experience it. Fred From aldcroft at head.cfa.harvard.edu Tue May 22 16:36:06 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Tue, 22 May 2012 16:36:06 -0400 Subject: [Numpy-discussion] subclassing ndarray subtleties?? In-Reply-To: <4FBBF1EE.1070004@thesamovar.net> References: <4FBBF1EE.1070004@thesamovar.net> Message-ID: On Tue, May 22, 2012 at 4:07 PM, Dan Goodman wrote: > On 22/05/2012 18:20, Nathaniel Smith wrote: >> I don't know of anything that the docs are lacking in particular. It's >> just that subclassing in general is basically a special form of >> monkey-patching: you have this ecosystem of cooperating methods, and >> then you're inserting some arbitrary changes in the middle of it. >> Making it all work in general requires that you carefully think >> through how all the different pieces of the ndarray API interact, and >> the ndarray API is very large and complicated. The __getslice__ thing >> is one example of this. For another: does your __getitem__ properly >> handle *all* the cases that regular ndarray.__getitem__ handles? (I'm >> not sure anyone actually knows what this complete list is, there are a >> lot of potential corner cases.) What happens if one of your objects is >> passed to third-party code that uses __getitem__? What happens if your >> array is accidentally stripped of its magic properties by passing >> through np.asarray() at the top of some function? Have you thought >> about how your special attributes are affected by, say, swapaxes? Have >> you applied your tweaks to item() and setitem()? >> >> I'm just guessing randomly here of course, since I have no idea what >> you've done. And I've subclassed ndarray myself at least three times, >> for reasons that seemed good enough at the time, so I'm not saying >> it's never doable. It's just that there are tons of these tiny little >> details, any one of which can trip you up, and that means that people >> tend to dive in and then discover the pitfalls later. > > I've also used subclasses of ndarray, and have stumbled across most (but > not all) of the problems you mentioned above. In my case, my code has > gradually evolved over a few years as I've become aware of each of these > problems. I think it would be useful to have an example of a completely > 'correctly' subclassed ndarray that handles all of these issues that > people could use as a template when they want to subclass ndarray. I > appreciate, though, that there's no-one who particularly wants to do > this! :) I'd offer my code as an example, but Nathaniel's comment above > shows that there's many things mine doesn't handle properly. > > Dan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > The text Nathaniel wrote is already very useful (certainly to me). It seems like this text could be put almost verbatim (maybe with some list items) in a subsection at the end of [1] titled "Caution" or "Other considerations". Thanks, Tom [1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html From chaoyuejoy at gmail.com Tue May 22 16:47:21 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 22 May 2012 22:47:21 +0200 Subject: [Numpy-discussion] assign a float number to a member of integer array always return integer In-Reply-To: References: Message-ID: Thanks Chris for informative post. cheers, Chao 2012/5/22 Chris Barker > On Tue, May 22, 2012 at 6:33 AM, Chao YUE wrote: > > > Just in case some one didn't know this. Assign a float number to an > integer > > array element will always return integer. > > right -- numpy arrays are typed -- that's one of the points of them -- > you wouldn't want the entire array up-cast with a single assignment -- > particularly since there are only python literals for a subset of the > numpy types. > > > so I would always do this if I expected a transfer from integer to float? > > In [18]: b=a.astype(float) > > yes -- but that's an odd way of thinking about -- what you want to do > is think about what type you need your array to be before you create > it, then create it the way you need it: > > In [87]: np.arange(5, dtype=np.float) > Out[87]: array([ 0., 1., 2., 3., 4.]) > > or better: > > In [91]: np.linspace(0,5,6) > Out[91]: array([ 0., 1., 2., 3., 4., 5.]) > > note that most (all?) numpy array constructors take a "dtype" argument. > > -Chris > > > > > > > In [19]: b > > Out[19]: array([ 2., 4., 6., 8., 10.]) > > > > In [20]: b[1]=4.5 > > > > In [21]: b > > Out[21]: array([ 2. , 4.5, 6. , 8. , 10. ]) > > > > thanks et cheers, > > > > Chao > > -- > > > *********************************************************************************** > > Chao YUE > > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > > UMR 1572 CEA-CNRS-UVSQ > > Batiment 712 - Pe 119 > > 91191 GIF Sur YVETTE Cedex > > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > > ************************************************************************************ > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue May 22 16:47:48 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 22 May 2012 13:47:48 -0700 Subject: [Numpy-discussion] subclassing ndarray subtleties?? In-Reply-To: <4FBBF1EE.1070004@thesamovar.net> References: <4FBBF1EE.1070004@thesamovar.net> Message-ID: On Tue, May 22, 2012 at 1:07 PM, Dan Goodman wrote: > I think it would be useful to have an example of a completely > 'correctly' subclassed ndarray that handles all of these issues that > people could use as a template when they want to subclass ndarray. I think this is by definition impossible -- if you are subclassing it, you are changing its behavior is *some* way, which way will determine how you want it it behave under all the various conditions that an array may encounter. So there is no subclass that handles all these issues, nor is there any pre-defined definition for correct. My personal use for subclassing has been to plug in a new object into code that was currently using a regular old numpy array -- in that case, all it needed to handle were the use-cases it was already being used in -- so running my test code was all I needed. But if I were startting from scratch, I'd probably use the "has a" rather than the "is a" OO model. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From travis at continuum.io Wed May 23 01:06:45 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 23 May 2012 00:06:45 -0500 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> Message-ID: <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> I just realized that the pull request doesn't do what I thought it did which is just add the flag to warn users who are writing to an array that is a view when it used to be a copy. It's more cautious and also "copies" the data for 1.7. Is this really a necessary step? I guess it depends on how many use-cases there are where people are relying on .diagonal() being a copy. Given that this is such an easy thing for people who encounter the warning to fix their code, it seems overly cautious to *also* make a copy (especially for a rare code-path like this --- although I admit that I don't have any reproducible data to support that assertion that it's a rare code-path). I think we have a mixed record of being cautious (not cautious enough in some changes), but this seems like swinging in the other direction of being overly cautious on a minor point. I wonder if I'm the only one who feels that way about this PR. This is not a major issue, so I am fine with the current strategy, but the drawback of being this cautious on this point is 1) it is not really reflective of other changes and 2) it does mean that someone who wants to fix their code for the future will end up with two copies for 1.7. -Travis On May 16, 2012, at 3:51 PM, Travis Oliphant wrote: > This Pull Request looks like a good idea to me as well. > > -Travis > > On May 16, 2012, at 3:10 PM, Ralf Gommers wrote: > >> >> >> On Wed, May 16, 2012 at 3:55 PM, Nathaniel Smith wrote: >> On Tue, May 15, 2012 at 2:49 PM, Fr?d?ric Bastien wrote: >> > Hi, >> > >> > In fact, I would arg to never change the current behavior, but add the >> > flag for people that want to use it. >> > >> > Why? >> > >> > 1) There is probably >10k script that use it that will need to be >> > checked for correctness. There won't be easy to see crash or error >> > that allow user to see it. >> >> My suggestion is that we follow the scheme, which I think gives ample >> opportunity for people to notice problems: >> >> 1.7: works like 1.6, except that a DeprecationWarning is produced if >> (and only if) someone writes to an array returned by np.diagonal (or >> friends). This gives a pleasant heads-up for those who pay attention >> to DeprecationWarnings. >> >> 1.8: return a view, but mark this view read-only. This causes crashes >> for anyone who ignored the DeprecationWarnings, guaranteeing that >> they'll notice the issue. >> >> 1.9: return a writeable view, transition complete. >> >> I've written a pull request implementing the first part of this; I >> hope everyone interested will take a look: >> https://github.com/numpy/numpy/pull/280 >> >> Thanks for doing that. Seems like a good way forward. >> >> When the PR gets merged, can you please also open a ticket for this with Milestone 1.8? Then we won't forget to make the required changes for that release. >> >> Ralf >> >> >> > 2) This is a globally not significant speed up by this change. Due to >> > 1), i think it is not work it. Why this is not a significant speed up? >> > First, the user already create and use the original tensor. Suppose a >> > matrix of size n x n. If it don't fit in the cache, creating it will >> > cost n * n. But coping it will cost cst * n. The cst is the price of >> > loading a full cache line. But if you return a view, you will pay this >> > cst price later when you do the computation. But it all case, this is >> > cheap compared to the cost of creating the matrix. Also, you will do >> > work on the matrix and this work will be much more costly then the >> > price of the copy. >> > >> > In the case the matrix fix in the cache, the price of the copy is even lower. >> > >> > So in conclusion, optimizing the diagonal won't give speed up in the >> > global user script, but will break many of them. >> >> I agree that the speed difference is small. I'm more worried about the >> cost to users of having to remember odd inconsistencies like this, and >> to think about whether there actually is a speed difference or not, >> etc. (If we do add a copy=False option, then I guarantee many people >> will use it religiously "just in case" the speed difference is enough >> to matter! And that would suck for them.) >> >> Returning a view makes the API slightly nicer, cleaner, more >> consistent, more useful. (I believe the reason this was implemented in >> the first place was that providing a convenient way to *write* to the >> diagonal of an arbitrary array made it easier to implement numpy.eye >> for masked arrays.) And the whole point of numpy is to trade off a >> little speed in favor of having a simple, easy-to-work with high-level >> API :-). >> >> -- Nathaniel >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.a.brown at ncl.ac.uk Wed May 23 05:22:10 2012 From: s.a.brown at ncl.ac.uk (shbr) Date: Wed, 23 May 2012 02:22:10 -0700 (PDT) Subject: [Numpy-discussion] Numpy inverse giving incorrect result Message-ID: <33894469.post@talk.nabble.com> I am running the following script as a basic input-output model but the inverse function is bringing back an incorrect result (I have checked this by inputting the data manually).I am very new to python but I assume the error occurs in how the data is gathered from the csv but I don?t know how to fix this. Any advice would be much appreciated. import numpy from numpy import * from numpy.linalg import * from numpy import genfromtxt conMatrix = genfromtxt("iopy.csv", delimiter=',') print "Consumption Matrix (C)" print conMatrix print "" #print "Consumption Matrix (C)" #manually entered data #conMatrix = numpy.matrix('0.4102 0.0301 0.0257; 0.0624 0.3783 0.1050; 0.1236 0.1588 0.1919') #print conMatrix identityMatrix = numpy.identity(3) print "Identity Matrix (I)" print identityMatrix print "" print "I-C" eyesee = numpy.subtract(identityMatrix, conMatrix) numpy.matrix(eyesee) print eyesee print "" print "(I-C)-1" inv = numpy.matrix(eyesee).I print inv print "" print "(I-C)-1*d" d = numpy.matrix('39.24 60.02 130.65') d.shape = (3,1) print inv*d raw_input() Thank you Shaun -- View this message in context: http://old.nabble.com/Numpy-inverse-giving-incorrect-result-tp33894469p33894469.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From njs at pobox.com Wed May 23 09:00:21 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 23 May 2012 14:00:21 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> Message-ID: On Wed, May 23, 2012 at 6:06 AM, Travis Oliphant wrote: > I just realized that the pull request doesn't do what I thought it did which > is just add the flag to warn users who are writing to an array that is a > view when it used to be a copy. ? ? It's more cautious and also "copies" the > data for 1.7. > > Is this really a necessary step? ? I guess it depends on how many use-cases > there are where people are relying on .diagonal() being a copy. ? Given that > this is such an easy thing for people who encounter the warning to fix their > code, it seems overly cautious to *also* make a copy (especially for a rare > code-path like this --- although I admit that I don't have any reproducible > data to support that assertion that it's a rare code-path). > > I think we have a mixed record of being cautious (not cautious enough in > some changes), but this seems like swinging in the other direction of being > overly cautious on a minor point. The reason this isn't a "minor point" is that if we just switched it then it's possible that existing, working code would start returning incorrect answers, and the only indication would be some console spew. I think that such changes should be absolutely verboten for a library like numpy. I'm already paranoid enough about my own code! That's why people up-thread were arguing that we just shouldn't risk the change at all, ever. I admit to some ulterior motive here: I'd like to see numpy be able to continue to evolve, but I am also, like I said, completely paranoid about fundamental libraries changing under me. So this is partly my attempt to find a way to make a potentially "dangerous" change in a responsible way. If we can't learn to do this, then honestly I think the only responsible alternative going forward would be to never change any existing API except in trivial ways (like removing deprecated functions). Basically my suggestion is that every time we alter the behaviour of existing, working code, there should be (a) a period when that existing code produces a warning, and (b) a period when that existing code produces an error. For a change like removing a function, this is easy. For something like this diagonal change, it's trickier, but still doable. - N From shish at keba.be Wed May 23 09:02:40 2012 From: shish at keba.be (Olivier Delalleau) Date: Wed, 23 May 2012 09:02:40 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> Message-ID: 2012/5/23 Nathaniel Smith > On Wed, May 23, 2012 at 6:06 AM, Travis Oliphant > wrote: > > I just realized that the pull request doesn't do what I thought it did > which > > is just add the flag to warn users who are writing to an array that is a > > view when it used to be a copy. It's more cautious and also "copies" > the > > data for 1.7. > > > > Is this really a necessary step? I guess it depends on how many > use-cases > > there are where people are relying on .diagonal() being a copy. Given > that > > this is such an easy thing for people who encounter the warning to fix > their > > code, it seems overly cautious to *also* make a copy (especially for a > rare > > code-path like this --- although I admit that I don't have any > reproducible > > data to support that assertion that it's a rare code-path). > > > > I think we have a mixed record of being cautious (not cautious enough in > > some changes), but this seems like swinging in the other direction of > being > > overly cautious on a minor point. > > The reason this isn't a "minor point" is that if we just switched it > then it's possible that existing, working code would start returning > incorrect answers, and the only indication would be some console spew. > I think that such changes should be absolutely verboten for a library > like numpy. I'm already paranoid enough about my own code! > > That's why people up-thread were arguing that we just shouldn't risk > the change at all, ever. > > I admit to some ulterior motive here: I'd like to see numpy be able to > continue to evolve, but I am also, like I said, completely paranoid > about fundamental libraries changing under me. So this is partly my > attempt to find a way to make a potentially "dangerous" change in a > responsible way. If we can't learn to do this, then honestly I think > the only responsible alternative going forward would be to never > change any existing API except in trivial ways (like removing > deprecated functions). > > Basically my suggestion is that every time we alter the behaviour of > existing, working code, there should be (a) a period when that > existing code produces a warning, and (b) a period when that existing > code produces an error. For a change like removing a function, this is > easy. For something like this diagonal change, it's trickier, but > still doable. > /agree with Nathaniel. Overly cautious is good! -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Wed May 23 09:11:52 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 23 May 2012 09:11:52 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> Message-ID: +1 Don't forget that many user always update to each version. So they will skip many version. This is especially true for people that rely on the distribution package that skip many version when they update. So this is not just a question of how many version we warn/err, but also how many times we wait that those warning/error get propagated to all user. Fred On Wed, May 23, 2012 at 9:02 AM, Olivier Delalleau wrote: > > /agree with Nathaniel. Overly cautious is good! > > -=- Olivier > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Wed May 23 09:48:13 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 23 May 2012 14:48:13 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> Message-ID: On Wed, May 23, 2012 at 2:11 PM, Fr?d?ric Bastien wrote: > +1 > > Don't forget that many user always update to each version. So they > will skip many version. This is especially true for people that rely > on the distribution package that skip many version when they update. > So this is not just a question of how many version we warn/err, but > also how many times we wait that those warning/error get propagated to > all user. Probably a question that we should revisit if/when numpy starts doing more than 1 release per year... -- Nathaniel From chaoyuejoy at gmail.com Wed May 23 10:49:25 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Wed, 23 May 2012 16:49:25 +0200 Subject: [Numpy-discussion] command for retrieving unmasked data from a mask array? Message-ID: Dear all, is there a command for retrieving unmasked data from a mask array? excepting using dt3[~dt3.mask].flatten()? thanks, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Wed May 23 10:54:52 2012 From: shish at keba.be (Olivier Delalleau) Date: Wed, 23 May 2012 10:54:52 -0400 Subject: [Numpy-discussion] command for retrieving unmasked data from a mask array? In-Reply-To: References: Message-ID: Should be dt3.compressed() -=- Olivier 2012/5/23 Chao YUE > Dear all, > > is there a command for retrieving unmasked data from a mask array? > excepting using dt3[~dt3.mask].flatten()? > > thanks, > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at gmail.com Wed May 23 11:37:54 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Wed, 23 May 2012 17:37:54 +0200 Subject: [Numpy-discussion] libnpysort.a and PyDataMem_NEW/FREE Message-ID: The "private" libnpysort.a (see https://github.com/numpy/numpy/pull/89 for its history) uses PyDataMem_NEW/FREE. I'm trying to convert these to actual functions to allow tracing numpy memory allocations (see https://github.com/numpy/numpy/pull/284). However, these new functions have to be in the API to allow third-party extensions to use them. This prevents libnpysort.a from finding them, as adding NUMPY_API to the declaration makes them static during the build process. There are several ways this could be dealt with: - convert all the NEW/FREE calls in the sort library to malloc/free (or the Python memory equivalents) - create a separate library for the memory functions (like libnpysort.a), but linked earlier. - something else? I'm not familiar enough with the internal of numpy and its API to know which is the best approach. Since the sorting functions could allocate significant amounts of data, I think it would be best for allocations within them to be tracable. Ray Jones From travis at continuum.io Wed May 23 13:29:29 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 23 May 2012 12:29:29 -0500 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> Message-ID: <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> On May 23, 2012, at 8:02 AM, Olivier Delalleau wrote: > 2012/5/23 Nathaniel Smith > On Wed, May 23, 2012 at 6:06 AM, Travis Oliphant wrote: > > I just realized that the pull request doesn't do what I thought it did which > > is just add the flag to warn users who are writing to an array that is a > > view when it used to be a copy. It's more cautious and also "copies" the > > data for 1.7. > > > > Is this really a necessary step? I guess it depends on how many use-cases > > there are where people are relying on .diagonal() being a copy. Given that > > this is such an easy thing for people who encounter the warning to fix their > > code, it seems overly cautious to *also* make a copy (especially for a rare > > code-path like this --- although I admit that I don't have any reproducible > > data to support that assertion that it's a rare code-path). > > > > I think we have a mixed record of being cautious (not cautious enough in > > some changes), but this seems like swinging in the other direction of being > > overly cautious on a minor point. > > The reason this isn't a "minor point" is that if we just switched it > then it's possible that existing, working code would start returning > incorrect answers, and the only indication would be some console spew. > I think that such changes should be absolutely verboten for a library > like numpy. I'm already paranoid enough about my own code! > > That's why people up-thread were arguing that we just shouldn't risk > the change at all, ever. > > I admit to some ulterior motive here: I'd like to see numpy be able to > continue to evolve, but I am also, like I said, completely paranoid > about fundamental libraries changing under me. So this is partly my > attempt to find a way to make a potentially "dangerous" change in a > responsible way. If we can't learn to do this, then honestly I think > the only responsible alternative going forward would be to never > change any existing API except in trivial ways (like removing > deprecated functions). > > Basically my suggestion is that every time we alter the behaviour of > existing, working code, there should be (a) a period when that > existing code produces a warning, and (b) a period when that existing > code produces an error. For a change like removing a function, this is > easy. For something like this diagonal change, it's trickier, but > still doable. > > /agree with Nathaniel. Overly cautious is good! > Then are you suggesting that we need to back out the changes to the casting rules as well, because this will also cause code to stop working. This is part of my point. We are not being consistently cautious. -Travis > -=- Olivier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Wed May 23 13:39:51 2012 From: shish at keba.be (Olivier Delalleau) Date: Wed, 23 May 2012 13:39:51 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> Message-ID: 2012/5/23 Travis Oliphant > > On May 23, 2012, at 8:02 AM, Olivier Delalleau wrote: > > 2012/5/23 Nathaniel Smith > >> On Wed, May 23, 2012 at 6:06 AM, Travis Oliphant >> wrote: >> > I just realized that the pull request doesn't do what I thought it did >> which >> > is just add the flag to warn users who are writing to an array that is a >> > view when it used to be a copy. It's more cautious and also >> "copies" the >> > data for 1.7. >> > >> > Is this really a necessary step? I guess it depends on how many >> use-cases >> > there are where people are relying on .diagonal() being a copy. Given >> that >> > this is such an easy thing for people who encounter the warning to fix >> their >> > code, it seems overly cautious to *also* make a copy (especially for a >> rare >> > code-path like this --- although I admit that I don't have any >> reproducible >> > data to support that assertion that it's a rare code-path). >> > >> > I think we have a mixed record of being cautious (not cautious enough in >> > some changes), but this seems like swinging in the other direction of >> being >> > overly cautious on a minor point. >> >> The reason this isn't a "minor point" is that if we just switched it >> then it's possible that existing, working code would start returning >> incorrect answers, and the only indication would be some console spew. >> I think that such changes should be absolutely verboten for a library >> like numpy. I'm already paranoid enough about my own code! >> >> That's why people up-thread were arguing that we just shouldn't risk >> the change at all, ever. >> >> I admit to some ulterior motive here: I'd like to see numpy be able to >> continue to evolve, but I am also, like I said, completely paranoid >> about fundamental libraries changing under me. So this is partly my >> attempt to find a way to make a potentially "dangerous" change in a >> responsible way. If we can't learn to do this, then honestly I think >> the only responsible alternative going forward would be to never >> change any existing API except in trivial ways (like removing >> deprecated functions). >> >> Basically my suggestion is that every time we alter the behaviour of >> existing, working code, there should be (a) a period when that >> existing code produces a warning, and (b) a period when that existing >> code produces an error. For a change like removing a function, this is >> easy. For something like this diagonal change, it's trickier, but >> still doable. >> > > /agree with Nathaniel. Overly cautious is good! > > > Then are you suggesting that we need to back out the changes to the > casting rules as well, because this will also cause code to stop working. > This is part of my point. We are not being consistently cautious. > > -Travis > Well, about casting rules... they've already been broken multiple times in previous releases (at least between 1.5 and 1.6, although I think I remember seeing some inconsistent behavior with older versions as well, but I'm less sure). So in some sense it's already too late, and it shouldn't hurt much more to break them again :P But yes, breaking them in the first place was bad. I spent a lot of time trying to figure out what was going on. Although I just said I don't think it's a big deal to break them again, if it's easy enough to add a warning on operations whose casting behavior changed, with an option to disable this warning (would probably need to be a global numpy setting -- is there a better way?), I would actually like it even better. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Wed May 23 13:40:13 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 23 May 2012 13:40:13 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> Message-ID: On Wed, May 23, 2012 at 1:29 PM, Travis Oliphant wrote: > > Then are you suggesting that we need to back out the changes to the casting > rules as well, because this will also cause code to stop working. ? This is > part of my point. ? We are not being consistently cautious. The casting change have already been release, or is there other change that break the user interface in 1.7? I probably won't change again the interface that was released except if there is good reason. I don't remember the detail enough to suggest any direction on this. I remember the past change on dtype that broke Theano. I consider this an "error", but we all do them from time to time. I don't blame anybody for this. We just changed Theano after understanding the numpy modification. I prefer that we try to be consistently cautious then to tell we didn't always do it in the past so we won't try now. Also, I would find better to voluntarily break this "consistently cautious" guideline if it is well documented and advertised if needed then to don't try to be "consistently cautious". Fred From d.s.seljebotn at astro.uio.no Wed May 23 16:00:23 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 23 May 2012 22:00:23 +0200 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> Message-ID: <4FBD41D7.5040406@astro.uio.no> On 05/23/2012 07:29 PM, Travis Oliphant wrote: > > On May 23, 2012, at 8:02 AM, Olivier Delalleau wrote: > >> 2012/5/23 Nathaniel Smith > >> >> On Wed, May 23, 2012 at 6:06 AM, Travis Oliphant >> > wrote: >> > I just realized that the pull request doesn't do what I thought >> it did which >> > is just add the flag to warn users who are writing to an array >> that is a >> > view when it used to be a copy. It's more cautious and also >> "copies" the >> > data for 1.7. >> > >> > Is this really a necessary step? I guess it depends on how many >> use-cases >> > there are where people are relying on .diagonal() being a copy. >> Given that >> > this is such an easy thing for people who encounter the warning >> to fix their >> > code, it seems overly cautious to *also* make a copy (especially >> for a rare >> > code-path like this --- although I admit that I don't have any >> reproducible >> > data to support that assertion that it's a rare code-path). >> > >> > I think we have a mixed record of being cautious (not cautious >> enough in >> > some changes), but this seems like swinging in the other >> direction of being >> > overly cautious on a minor point. >> >> The reason this isn't a "minor point" is that if we just switched it >> then it's possible that existing, working code would start returning >> incorrect answers, and the only indication would be some console spew. >> I think that such changes should be absolutely verboten for a library >> like numpy. I'm already paranoid enough about my own code! >> >> That's why people up-thread were arguing that we just shouldn't risk >> the change at all, ever. >> >> I admit to some ulterior motive here: I'd like to see numpy be able to >> continue to evolve, but I am also, like I said, completely paranoid >> about fundamental libraries changing under me. So this is partly my >> attempt to find a way to make a potentially "dangerous" change in a >> responsible way. If we can't learn to do this, then honestly I think >> the only responsible alternative going forward would be to never >> change any existing API except in trivial ways (like removing >> deprecated functions). >> >> Basically my suggestion is that every time we alter the behaviour of >> existing, working code, there should be (a) a period when that >> existing code produces a warning, and (b) a period when that existing >> code produces an error. For a change like removing a function, this is >> easy. For something like this diagonal change, it's trickier, but >> still doable. >> >> >> /agree with Nathaniel. Overly cautious is good! >> > > Then are you suggesting that we need to back out the changes to the > casting rules as well, because this will also cause code to stop > working. This is part of my point. We are not being consistently cautious. Two wrongs doesn't make one right? I'd think the inconvenience to users is mostly "per unwarned breakage", so that even one unwarned breakage less translates into fewer minutes wasted for users scratching their heads. In the end it's a tradeoff between inconvenience to NumPy developers and inconvenience to NumPy users -- not inconveniencing the developers further is an argument for not being consistent; but for diagonal() the work is already done. Dag From d.s.seljebotn at astro.uio.no Wed May 23 16:01:36 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 23 May 2012 22:01:36 +0200 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <4FBD41D7.5040406@astro.uio.no> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> Message-ID: <4FBD4220.6060801@astro.uio.no> On 05/23/2012 10:00 PM, Dag Sverre Seljebotn wrote: > On 05/23/2012 07:29 PM, Travis Oliphant wrote: >> >> On May 23, 2012, at 8:02 AM, Olivier Delalleau wrote: >> >>> 2012/5/23 Nathaniel Smith> >>> >>> On Wed, May 23, 2012 at 6:06 AM, Travis Oliphant >>> > wrote: >>> > I just realized that the pull request doesn't do what I thought >>> it did which >>> > is just add the flag to warn users who are writing to an array >>> that is a >>> > view when it used to be a copy. It's more cautious and also >>> "copies" the >>> > data for 1.7. >>> > >>> > Is this really a necessary step? I guess it depends on how many >>> use-cases >>> > there are where people are relying on .diagonal() being a copy. >>> Given that >>> > this is such an easy thing for people who encounter the warning >>> to fix their >>> > code, it seems overly cautious to *also* make a copy (especially >>> for a rare >>> > code-path like this --- although I admit that I don't have any >>> reproducible >>> > data to support that assertion that it's a rare code-path). >>> > >>> > I think we have a mixed record of being cautious (not cautious >>> enough in >>> > some changes), but this seems like swinging in the other >>> direction of being >>> > overly cautious on a minor point. >>> >>> The reason this isn't a "minor point" is that if we just switched it >>> then it's possible that existing, working code would start returning >>> incorrect answers, and the only indication would be some console spew. >>> I think that such changes should be absolutely verboten for a library >>> like numpy. I'm already paranoid enough about my own code! >>> >>> That's why people up-thread were arguing that we just shouldn't risk >>> the change at all, ever. >>> >>> I admit to some ulterior motive here: I'd like to see numpy be able to >>> continue to evolve, but I am also, like I said, completely paranoid >>> about fundamental libraries changing under me. So this is partly my >>> attempt to find a way to make a potentially "dangerous" change in a >>> responsible way. If we can't learn to do this, then honestly I think >>> the only responsible alternative going forward would be to never >>> change any existing API except in trivial ways (like removing >>> deprecated functions). >>> >>> Basically my suggestion is that every time we alter the behaviour of >>> existing, working code, there should be (a) a period when that >>> existing code produces a warning, and (b) a period when that existing >>> code produces an error. For a change like removing a function, this is >>> easy. For something like this diagonal change, it's trickier, but >>> still doable. >>> >>> >>> /agree with Nathaniel. Overly cautious is good! >>> >> >> Then are you suggesting that we need to back out the changes to the >> casting rules as well, because this will also cause code to stop >> working. This is part of my point. We are not being consistently cautious. > > Two wrongs doesn't make one right? > > I'd think the inconvenience to users is mostly "per unwarned breakage", > so that even one unwarned breakage less translates into fewer minutes > wasted for users scratching their heads. > > In the end it's a tradeoff between inconvenience to NumPy developers and > inconvenience to NumPy users -- not inconveniencing the developers > further is an argument for not being consistent; but for diagonal() the > work is already done. ...and, I missed the point about a future-compatible fix implying double-copy. Dag From pav at iki.fi Wed May 23 16:03:59 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 23 May 2012 22:03:59 +0200 Subject: [Numpy-discussion] Numpy inverse giving incorrect result In-Reply-To: <33894469.post@talk.nabble.com> References: <33894469.post@talk.nabble.com> Message-ID: 23.05.2012 11:22, shbr kirjoitti: > I am running the following script as a basic input-output model but the > inverse function is bringing back an incorrect result (I have checked this > by inputting the data manually).I am very new to python but I assume the > error occurs in how the data is gathered from the csv but I don?t know how > to fix this. Any advice would be much appreciated. >From your explanation it is unfortunately not clear what actually does not work, and what you would expect as results. Can you clarify? The inverse certainly gives the correct result: >>> conMatrix = numpy.matrix('0.4102 0.0301 0.0257; 0.0624 0.3783 0.1050; 0.1236 0.1588 0.1919') >>> eyesee = numpy.eye(3) - conMatrix >>> eyesee.I matrix([[ 1.72033888, 0.10060528, 0.06778402], [ 0.22456355, 1.67684212, 0.22502129], [ 0.30725724, 0.34490452, 1.29205728]]) >>> eyesee.I * eyesee matrix([[ 1.00000000e+00, -3.46944695e-18, 0.00000000e+00], [ 2.42861287e-17, 1.00000000e+00, 0.00000000e+00], [ 2.77555756e-17, 0.00000000e+00, 1.00000000e+00]]) From njs at pobox.com Wed May 23 16:13:42 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 23 May 2012 21:13:42 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> Message-ID: On Wed, May 23, 2012 at 6:29 PM, Travis Oliphant wrote: > Then are you suggesting that we need to back out the changes to the casting > rules as well, because this will also cause code to stop working. ? This is > part of my point. ? We are not being consistently cautious. I never understood exactly what changed with the casting rules, but yeah, maybe. Still, the question of what our deprecation rules *should* be is somewhat separate from the question of what we've actually done (or even will do). You have to have ideals before you can ask whether you're living up to them :-). Didn't the casting rules become strictly stricter, i.e. some questionable operations that used to succeed now throw an error? If so then that's not a *major* violation of my suggested rules, but yeah, I guess it'd probably be better if they did warn. I imagine it wouldn't be terribly difficult to implement (add a new NPY_WARN_UNSAFE_CASTING_INTERNAL value, use it everywhere that used to be UNSAFE but now will be SAFE?), but someone who understands better what actually changed (Mark?) would have do it. -- Nathaniel From chaoyuejoy at gmail.com Wed May 23 16:34:39 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Wed, 23 May 2012 22:34:39 +0200 Subject: [Numpy-discussion] command for retrieving unmasked data from a mask array? In-Reply-To: References: Message-ID: Thanks Olivier. it works. chao 2012/5/23 Olivier Delalleau > Should be dt3.compressed() > > -=- Olivier > > 2012/5/23 Chao YUE > >> Dear all, >> >> is there a command for retrieving unmasked data from a mask array? >> excepting using dt3[~dt3.mask].flatten()? >> >> thanks, >> >> Chao >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Wed May 23 17:04:11 2012 From: shish at keba.be (Olivier Delalleau) Date: Wed, 23 May 2012 17:04:11 -0400 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> Message-ID: 2012/5/23 Nathaniel Smith > On Wed, May 23, 2012 at 6:29 PM, Travis Oliphant > wrote: > > Then are you suggesting that we need to back out the changes to the > casting > > rules as well, because this will also cause code to stop working. This > is > > part of my point. We are not being consistently cautious. > > I never understood exactly what changed with the casting rules, but > yeah, maybe. Still, the question of what our deprecation rules > *should* be is somewhat separate from the question of what we've > actually done (or even will do). You have to have ideals before you > can ask whether you're living up to them :-). > > Didn't the casting rules become strictly stricter, i.e. some > questionable operations that used to succeed now throw an error? If so > then that's not a *major* violation of my suggested rules, but yeah, I > guess it'd probably be better if they did warn. I imagine it wouldn't > be terribly difficult to implement (add a new > NPY_WARN_UNSAFE_CASTING_INTERNAL value, use it everywhere that used to > be UNSAFE but now will be SAFE?), but someone who understands better > what actually changed (Mark?) would have do it. > It wasn't just stricter rules. Some operations involving in particular mixed scalar / array computations resulted in different outputs (with no warning). -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed May 23 17:53:00 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 23 May 2012 16:53:00 -0500 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <4FBD4220.6060801@astro.uio.no> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> Message-ID: <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> To be clear, I'm not opposed to the change, and it looks like we should go forward. In my mind it's not about developers vs. users as satisfying users is the whole point. The purpose of NumPy is not to make its developers happy :-). But, users also want there to *be* developers on NumPy so developer happiness is not irrelevant. In this case, though, there are consequences for users because of the double copy if a user wants to make their code future proof. We are always trading off predicted user-experiences. I hope that we all don't have the same perspective on every issue or more than likely their aren't enough voices being heard from real users. -Travis On May 23, 2012, at 3:01 PM, Dag Sverre Seljebotn wrote: > On 05/23/2012 10:00 PM, Dag Sverre Seljebotn wrote: >> On 05/23/2012 07:29 PM, Travis Oliphant wrote: >>> >>> On May 23, 2012, at 8:02 AM, Olivier Delalleau wrote: >>> >>>> 2012/5/23 Nathaniel Smith> >>>> >>>> On Wed, May 23, 2012 at 6:06 AM, Travis Oliphant >>>> > wrote: >>>>> I just realized that the pull request doesn't do what I thought >>>> it did which >>>>> is just add the flag to warn users who are writing to an array >>>> that is a >>>>> view when it used to be a copy. It's more cautious and also >>>> "copies" the >>>>> data for 1.7. >>>>> >>>>> Is this really a necessary step? I guess it depends on how many >>>> use-cases >>>>> there are where people are relying on .diagonal() being a copy. >>>> Given that >>>>> this is such an easy thing for people who encounter the warning >>>> to fix their >>>>> code, it seems overly cautious to *also* make a copy (especially >>>> for a rare >>>>> code-path like this --- although I admit that I don't have any >>>> reproducible >>>>> data to support that assertion that it's a rare code-path). >>>>> >>>>> I think we have a mixed record of being cautious (not cautious >>>> enough in >>>>> some changes), but this seems like swinging in the other >>>> direction of being >>>>> overly cautious on a minor point. >>>> >>>> The reason this isn't a "minor point" is that if we just switched it >>>> then it's possible that existing, working code would start returning >>>> incorrect answers, and the only indication would be some console spew. >>>> I think that such changes should be absolutely verboten for a library >>>> like numpy. I'm already paranoid enough about my own code! >>>> >>>> That's why people up-thread were arguing that we just shouldn't risk >>>> the change at all, ever. >>>> >>>> I admit to some ulterior motive here: I'd like to see numpy be able to >>>> continue to evolve, but I am also, like I said, completely paranoid >>>> about fundamental libraries changing under me. So this is partly my >>>> attempt to find a way to make a potentially "dangerous" change in a >>>> responsible way. If we can't learn to do this, then honestly I think >>>> the only responsible alternative going forward would be to never >>>> change any existing API except in trivial ways (like removing >>>> deprecated functions). >>>> >>>> Basically my suggestion is that every time we alter the behaviour of >>>> existing, working code, there should be (a) a period when that >>>> existing code produces a warning, and (b) a period when that existing >>>> code produces an error. For a change like removing a function, this is >>>> easy. For something like this diagonal change, it's trickier, but >>>> still doable. >>>> >>>> >>>> /agree with Nathaniel. Overly cautious is good! >>>> >>> >>> Then are you suggesting that we need to back out the changes to the >>> casting rules as well, because this will also cause code to stop >>> working. This is part of my point. We are not being consistently cautious. >> >> Two wrongs doesn't make one right? >> >> I'd think the inconvenience to users is mostly "per unwarned breakage", >> so that even one unwarned breakage less translates into fewer minutes >> wasted for users scratching their heads. >> >> In the end it's a tradeoff between inconvenience to NumPy developers and >> inconvenience to NumPy users -- not inconveniencing the developers >> further is an argument for not being consistent; but for diagonal() the >> work is already done. > > ...and, I missed the point about a future-compatible fix implying > double-copy. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Wed May 23 18:31:48 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 23 May 2012 23:31:48 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> Message-ID: On Wed, May 23, 2012 at 10:53 PM, Travis Oliphant wrote: > To be clear, I'm not opposed to the change, and it looks like we should go forward. > > In my mind it's not about developers vs. users as satisfying users is the whole point. ? The purpose of NumPy is not to make its developers happy :-). ? But, users also want there to *be* developers on NumPy so developer happiness is not irrelevant. > > In this case, though, there are consequences for users because of the double copy if a user wants to make their code future proof. ? We are always trading off predicted user-experiences. ? ?I hope that we all don't have the same perspective on every issue or more than likely their aren't enough voices being heard from real users. I'm not really worried about users who have a problem with the double-copy. It's a totally legitimate concern, but anyone who has that concern has already understood the issues well enough to be able to take care of themselves, and decided that it's worth the effort to special-case this. They can check whether the returned array has .base set to tell whether it's an array or a view, use a temporary hack to check for the secret warning flag in arr.flags.num, check the numpy version, all sorts of things to get them through the one version where this matters. The suggestion in the docs to make a copy is not exactly binding :-). -- Nathaniel From ischnell at enthought.com Wed May 23 19:13:28 2012 From: ischnell at enthought.com (Ilan Schnell) Date: Wed, 23 May 2012 18:13:28 -0500 Subject: [Numpy-discussion] ANN: EPD 7.3 (and 8 preview beta) released Message-ID: Hello, I am pleased to announce the release of Enthought Python Distribution, EPD version 7.3, along with its "EPD Free" counterpart. The highlights of this release are: the addition of enaml, Shapely and several other packages, as well as updates to over 30 packages, including SciPy and IPython. To see which libraries are included in the free vs. full version, please see: http://www.enthought.com/products/epdlibraries.php The complete list of additions, updates and fixes is in the change log: http://www.enthought.com/products/changelog.php EPD 8 preview beta ------------------ EPD 8.0 beta takes all that we know and love in EPD 7.x and adds an all-new graphical development and analysis environment. The new GUI is focused on providing a fast, lightweight interface designed for scientists and engineers. Some of the key features are: * A Python-centric text editor including tab-completion plus on-the-fly code analysis. * An interactive Python (IPython) prompt integrated with the code editor to enable rapid prototyping and exploration. * A Python package manager to make is easier to discover, install, and update packages in the Enthought Python Distribution. * Integrated documentation, both on the GUI itself and standard online documentation. EPD 8 beta can be downloaded from: https://beta.enthought.com/EPD_8/download/ About EPD --------- The Enthought Python Distribution (EPD) is a "kitchen-sink-included" distribution of the Python programming language, including over 90 additional tools and libraries. The EPD bundle includes NumPy, SciPy, IPython, 2D and 3D visualization tools, and many other tools. EPD is currently available as a single-click installer for Windows XP, Vista and 7, MacOS (10.5 and 10.6), RedHat 3, 4, 5 and 6, as well as Solaris 10 (x86 and x86_64/amd64 on all platforms). All versions of EPD (32 and 64-bit) are free for academic use. An annual subscription including installation support is available for individual and commercial use. Additional support options, including customization, bug fixes and training classes are also available: http://www.enthought.com/products/epd_sublevels.php - The EPD Team From Kathleen.M.Tacina at nasa.gov Wed May 23 19:16:06 2012 From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina) Date: Wed, 23 May 2012 23:16:06 +0000 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or aview? (1.7 compatibility issue) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> Message-ID: <1337814966.6811.45.camel@MOSES.grc.nasa.gov> On Wed, 2012-05-23 at 17:31 -0500, Nathaniel Smith wrote: > On Wed, May 23, 2012 at 10:53 PM, Travis Oliphant wrote: > > To be clear, I'm not opposed to the change, and it looks like we should go forward. > > > > In my mind it's not about developers vs. users as satisfying users is the whole point. The purpose of NumPy is not to make its developers happy :-). But, users also want there to *be* developers on NumPy so developer happiness is not irrelevant. > > > > In this case, though, there are consequences for users because of the double copy if a user wants to make their code future proof. We are always trading off predicted user-experiences. I hope that we all don't have the same perspective on every issue or more than likely their aren't enough voices being heard from real users. > > I'm not really worried about users who have a problem with the > double-copy. It's a totally legitimate concern, but anyone who has > that concern has already understood the issues well enough to be able > to take care of themselves, and decided that it's worth the effort to > special-case this. They can check whether the returned array has .base > set to tell whether it's an array or a view, use a temporary hack to > check for the secret warning flag in arr.flags.num, check the numpy > version, all sorts of things to get them through the one version where > this matters. The suggestion in the docs to make a copy is not exactly > binding :-). > > -- Nathaniel As a "real user", if I care about whether an array arr2 is a copy or a view, I usually either check arr2.flags.owndata or append copy() to the statement that created arr2, e.g., arr2 = arr.diagonal().copy(). Numpy rules on views vs. copies sometimes require a bit of thought, and so I'll frequently just check the flags or make a copy instead of thinking. (More foolproof :).) Kathy - -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjhnson at gmail.com Wed May 23 19:31:52 2012 From: tjhnson at gmail.com (T J) Date: Wed, 23 May 2012 16:31:52 -0700 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or aview? (1.7 compatibility issue) In-Reply-To: <1337814966.6811.45.camel@MOSES.grc.nasa.gov> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> Message-ID: On Wed, May 23, 2012 at 4:16 PM, Kathleen M Tacina < Kathleen.M.Tacina at nasa.gov> wrote: > ** > On Wed, 2012-05-23 at 17:31 -0500, Nathaniel Smith wrote: > > On Wed, May 23, 2012 at 10:53 PM, Travis Oliphant wrote:> To be clear, I'm not opposed to the change, and it looks like we should go forward.>> In my mind it's not about developers vs. users as satisfying users is the whole point. The purpose of NumPy is not to make its developers happy :-). But, users also want there to *be* developers on NumPy so developer happiness is not irrelevant.>> In this case, though, there are consequences for users because of the double copy if a user wants to make their code future proof. We are always trading off predicted user-experiences. I hope that we all don't have the same perspective on every issue or more than likely their aren't enough voices being heard from real users. > I'm not really worried about users who have a problem with thedouble-copy. It's a totally legitimate concern, but anyone who hasthat concern has already understood the issues well enough to be ableto take care of themselves, and decided that it's worth the effort tospecial-case this. They can check whether the returned array has .baseset to tell whether it's an array or a view, use a temporary hack tocheck for the secret warning flag in arr.flags.num, check the numpyversion, all sorts of things to get them through the one version wherethis matters. The suggestion in the docs to make a copy is not exactlybinding :-). > -- Nathaniel > > > As a "real user", if I care about whether an array arr2 is a copy or a > view, I usually either check arr2.flags.owndata or append copy() to the > statement that created arr2, e.g., arr2 = arr.diagonal().copy(). > > Numpy rules on views vs. copies sometimes require a bit of thought, and so > I'll frequently just check the flags or make a copy instead of thinking. > (More foolproof :).) > > > It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods "here to stay"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From lpc at cmu.edu Thu May 24 02:25:56 2012 From: lpc at cmu.edu (Luis Pedro Coelho) Date: Thu, 24 May 2012 07:25:56 +0100 Subject: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue) In-Reply-To: <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> Message-ID: <1835863.1KnnIdHpxV@rabbit> On one of my papers, we put up the code online. Years afterwards, I still get emails every six months or so because the version of the code which was used for the paper now returns the wrong result! The problem is that it was written for the old histogram and, although I have a new version of the code, sometimes people still download the old version. just my two cent, Luis On Wednesday, May 23, 2012 12:06:45 AM Travis Oliphant wrote: > I just realized that the pull request doesn't do what I thought it did which > is just add the flag to warn users who are writing to an array that is a > view when it used to be a copy. It's more cautious and also "copies" > the data for 1.7. > > Is this really a necessary step? I guess it depends on how many use-cases > there are where people are relying on .diagonal() being a copy. Given > that this is such an easy thing for people who encounter the warning to fix > their code, it seems overly cautious to *also* make a copy (especially for > a rare code-path like this --- although I admit that I don't have any > reproducible data to support that assertion that it's a rare code-path). > > I think we have a mixed record of being cautious (not cautious enough in > some changes), but this seems like swinging in the other direction of being > overly cautious on a minor point. > > I wonder if I'm the only one who feels that way about this PR. This is not > a major issue, so I am fine with the current strategy, but the drawback of > being this cautious on this point is 1) it is not really reflective of > other changes and 2) it does mean that someone who wants to fix their code > for the future will end up with two copies for 1.7. > > -Travis > > On May 16, 2012, at 3:51 PM, Travis Oliphant wrote: > > This Pull Request looks like a good idea to me as well. > > > > -Travis > > > > On May 16, 2012, at 3:10 PM, Ralf Gommers wrote: > >> On Wed, May 16, 2012 at 3:55 PM, Nathaniel Smith wrote: > >> > >> On Tue, May 15, 2012 at 2:49 PM, Fr?d?ric Bastien wrote: > >> > Hi, > >> > > >> > In fact, I would arg to never change the current behavior, but add the > >> > flag for people that want to use it. > >> > > >> > Why? > >> > > >> > 1) There is probably >10k script that use it that will need to be > >> > checked for correctness. There won't be easy to see crash or error > >> > that allow user to see it. > >> > >> My suggestion is that we follow the scheme, which I think gives ample > >> opportunity for people to notice problems: > >> > >> 1.7: works like 1.6, except that a DeprecationWarning is produced if > >> (and only if) someone writes to an array returned by np.diagonal (or > >> friends). This gives a pleasant heads-up for those who pay attention > >> to DeprecationWarnings. > >> > >> 1.8: return a view, but mark this view read-only. This causes crashes > >> for anyone who ignored the DeprecationWarnings, guaranteeing that > >> they'll notice the issue. > >> > >> 1.9: return a writeable view, transition complete. > >> > >> I've written a pull request implementing the first part of this; I > >> > >> hope everyone interested will take a look: > >> https://github.com/numpy/numpy/pull/280 > >> > >> Thanks for doing that. Seems like a good way forward. > >> > >> When the PR gets merged, can you please also open a ticket for this with > >> Milestone 1.8? Then we won't forget to make the required changes for > >> that release. > >> > >> Ralf > >> > >> > 2) This is a globally not significant speed up by this change. Due to > >> > 1), i think it is not work it. Why this is not a significant speed up? > >> > First, the user already create and use the original tensor. Suppose a > >> > matrix of size n x n. If it don't fit in the cache, creating it will > >> > cost n * n. But coping it will cost cst * n. The cst is the price of > >> > loading a full cache line. But if you return a view, you will pay this > >> > cst price later when you do the computation. But it all case, this is > >> > cheap compared to the cost of creating the matrix. Also, you will do > >> > work on the matrix and this work will be much more costly then the > >> > price of the copy. > >> > > >> > In the case the matrix fix in the cache, the price of the copy is even > >> > lower. > >> > > >> > So in conclusion, optimizing the diagonal won't give speed up in the > >> > global user script, but will break many of them. > >> > >> I agree that the speed difference is small. I'm more worried about the > >> cost to users of having to remember odd inconsistencies like this, and > >> to think about whether there actually is a speed difference or not, > >> etc. (If we do add a copy=False option, then I guarantee many people > >> will use it religiously "just in case" the speed difference is enough > >> to matter! And that would suck for them.) > >> > >> Returning a view makes the API slightly nicer, cleaner, more > >> consistent, more useful. (I believe the reason this was implemented in > >> the first place was that providing a convenient way to *write* to the > >> diagonal of an arbitrary array made it easier to implement numpy.eye > >> for masked arrays.) And the whole point of numpy is to trade off a > >> little speed in favor of having a simple, easy-to-work with high-level > >> API :-). > >> > >> -- Nathaniel > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Luis Pedro Coelho | Institute for Molecular Medicine | http://luispedro.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From francesc at continuum.io Thu May 24 04:32:43 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 24 May 2012 10:32:43 +0200 Subject: [Numpy-discussion] question about in-place operations In-Reply-To: References: <642A16D6-99E3-4298-8A3E-9028F6BDE638@gmail.com> <4FBBA367.8060406@astro.uio.no> <4FBBDF2F.6010803@astro.uio.no> <4FBBE18A.6060300@continuum.io> Message-ID: <4FBDF22B.6020203@continuum.io> On 5/22/12 9:08 PM, Massimo Di Pierro wrote: > This problem is linear so probably Ram IO bound. I do not think I > would benefit much for multiple cores. But I will give it a try. In > the short term this is good enough for me. Yeah, this what common sense seems to indicate, that RAM IO bound problems do not benefit from using multiple cores. But reality is different: >>> import numpy as np >>> a = np.arange(1e8) >>> c = 1.0 >>> time a*c CPU times: user 0.22 s, sys: 0.20 s, total: 0.43 s Wall time: 0.43 s array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., 9.99999970e+07, 9.99999980e+07, 9.99999990e+07]) Using numexpr with 1 thread: >>> import numexpr as ne >>> ne.set_num_threads(1) 8 >>> time ne.evaluate("a*c") CPU times: user 0.20 s, sys: 0.25 s, total: 0.45 s Wall time: 0.45 s array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., 9.99999970e+07, 9.99999980e+07, 9.99999990e+07]) while using 8 threads (the machine has 8 physical cores): >>> ne.set_num_threads(8) 1 >>> time ne.evaluate("a*c") CPU times: user 0.39 s, sys: 0.68 s, total: 1.07 s Wall time: 0.14 s array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., 9.99999970e+07, 9.99999980e+07, 9.99999990e+07]) which is 3x faster than using 1 single thread (look at wall time figures). It has to be clear that this is purely due to the fact that several cores can transmit data to/from memory from/to CPU faster than one single core. I have seen this behavior lots of times; for example, in slide 21 of this presentation: http://pydata.org/pycon2012/numexpr-cython/slides.pdf one can see how using several cores can speed up not only a polynomial computation, but also the simple expression "y = x", which is essentially a memory copy. Another example where this effect can be seen is the Blosc compressor. For example, in: http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks the first points on each of the plots means that Blosc is in compression level 0, that is, it does not compress at all, and it basically copies data from origin to destination buffers. Still, one can see that using several threads can accelerate this copy well beyond memcpy speed. So, definitely, several cores can make your memory I/O bounded computations go faster. -- Francesc Alted From tmp50 at ukr.net Thu May 24 07:32:29 2012 From: tmp50 at ukr.net (Dmitrey) Date: Thu, 24 May 2012 14:32:29 +0300 Subject: [Numpy-discussion] Some numpy funcs for PyPy Message-ID: <6355.1337859149.6100681629467803648@ffe6.ukr.net> hi all, maybe you're aware of numpypy - numpy port for pypy (pypy.org) - Python language implementation with dynamic compilation. Unfortunately, numpypy developmnent is very slow due to strict quality standards and some other issues, so for my purposes I have provided some missing numpypy funcs, in particular * atleast_1d, atleast_2d, hstack, vstack, cumsum, isscalar, asscalar, asfarray, flatnonzero, tile, zeros_like, ones_like, empty_like, where, searchsorted * with "axis" parameter: nan(arg)min, nan(arg)max, all, any and have got some OpenOpt / FuncDesigner functionality working faster than in CPython. File with this functions you can get here Also you may be interested in some info at http://openopt.org/PyPy Regards, Dmitrey. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fccoelho at gmail.com Thu May 24 07:43:45 2012 From: fccoelho at gmail.com (Flavio Coelho) Date: Thu, 24 May 2012 08:43:45 -0300 Subject: [Numpy-discussion] [SciPy-User] Some numpy funcs for PyPy In-Reply-To: <6355.1337859149.6100681629467803648@ffe6.ukr.net> References: <6355.1337859149.6100681629467803648@ffe6.ukr.net> Message-ID: That's very usefull! I hope these features get included upstream in the next release of numpypy. thanks, Fl?vio On Thu, May 24, 2012 at 8:32 AM, Dmitrey wrote: > hi all, > maybe you're aware of numpypy - numpy port for pypy (pypy.org) - Python > language implementation with dynamic compilation. > > Unfortunately, numpypy developmnent is very slow due to strict quality > standards and some other issues, so for my purposes I have provided some > missing numpypy funcs, in particular > > > - atleast_1d, atleast_2d, hstack, vstack, cumsum, isscalar, asscalar, > asfarray, flatnonzero, tile, zeros_like, ones_like, empty_like, where, > searchsorted > - with "axis" parameter: nan(arg)min, nan(arg)max, all, any > > and have got some OpenOpt / FuncDesigner functionality working faster than > in CPython. > > File with this functions you can get here > > Also you may be interested in some info at http://openopt.org/PyPy > Regards, Dmitrey. > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Fl?vio Code?o Coelho ================ +55(21) 3799-5567 Professor Escola de Matem?tica Aplicada Funda??o Get?lio Vargas Rio de Janeiro - RJ Brasil -------------- next part -------------- An HTML attachment was scrubbed... URL: From numpy-discussion at maubp.freeserve.co.uk Thu May 24 08:19:23 2012 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Thu, 24 May 2012 13:19:23 +0100 Subject: [Numpy-discussion] Some numpy funcs for PyPy In-Reply-To: <6355.1337859149.6100681629467803648@ffe6.ukr.net> References: <6355.1337859149.6100681629467803648@ffe6.ukr.net> Message-ID: On Thu, May 24, 2012 at 12:32 PM, Dmitrey wrote: > hi all, > maybe you're aware of numpypy - numpy port for pypy (pypy.org) - Python > language implementation with dynamic compilation. > > Unfortunately, numpypy developmnent is very slow due to strict quality > standards and some other issues, so for my purposes I have provided some > missing numpypy funcs, in particular > > atleast_1d, atleast_2d, hstack, vstack, cumsum, isscalar, asscalar, > asfarray, flatnonzero, tile, zeros_like, ones_like, empty_like, where, > searchsorted > with "axis" parameter: nan(arg)min, nan(arg)max, all, any > > and have got some OpenOpt / FuncDesigner functionality working > faster than in CPython. > > File with this functions you can get here > > Also you may be interested in some info at http://openopt.org/PyPy > Regards, Dmitrey. As a NumPy user interested in PyPy it is great to know more people are trying to contribute in this area. I myself have only filed PyPy bugs about missing NumPy features rendering the initial numpypy support useless to me. On your website you wrote: >> From my (Dmitrey) point of view numpypy development is >> very unfriendly for newcomers - PyPy developers say "provide >> code, preferably in interpreter level instead of AppLevel, >> provide whole test coverage for all possible corner cases, >> provide hg diff for code, and then, maybe, it will be committed". >> Probably this is the reason why so insufficient number of >> developers work on numpypy. I assume that is paraphrased with a little hyperbole, but it isn't so different from numpy (other than using git), or many other open source projects. Unit tests are important, and taking patches without them is risky. I've been subscribed to the pypy-dev list for a while, but I don't recall seeing you posting there. Have you tried to submit any of your work to PyPy yet? Perhaps you should have sent this message to pypy-dev instead? (I am trying to be constructive, not critical.) Regards, Peter From tmp50 at ukr.net Thu May 24 09:07:25 2012 From: tmp50 at ukr.net (Dmitrey) Date: Thu, 24 May 2012 16:07:25 +0300 Subject: [Numpy-discussion] Some numpy funcs for PyPy In-Reply-To: References: <6355.1337859149.6100681629467803648@ffe6.ukr.net> Message-ID: <69589.1337864845.13913483336144781312@ffe16.ukr.net> > On your website you wrote: >> From my (Dmitrey) point of view numpypy development is >> very unfriendly for newcomers - PyPy developers say "provide >> code, preferably in interpreter level instead of AppLevel, >> provide whole test coverage for all possible corner cases, >> provide hg diff for code, and then, maybe, it will be committed". >> Probably this is the reason why so insufficient number of >> developers work on numpypy. I assume that is paraphrased with a little hyperbole, but it isn't so different from numpy (other than using git), or many other open source projects. > Of course, many opensource projects do like that, but in the case of numpypy IMHO the things are especially bad. > Unit tests are important, and taking patches without them is risky. > Yes, but at first, things required from numpypy newcomers are TOO complicated - and no guarrantee is provided, that elapsed efforts will not be just a waste of time; at 2nd, the high-quality standards are especially cynic when compared with their own code quality, e.g. numpypy.all(True) doesn't work yet, despite it hangs in bug tracker for a long time; a[a<0] = b[b<0] works incorrectly etc. These are reasons that forced me to write some required for my purposes missing funcs and some bug walkarounds (like for that one with numpypy.all and any). > ?I've been subscribed to the pypy-dev list for a while, > I had been subsribed IIRC for a couple of months > but I don't recall seeing you posting there. > I had made some, see my pypy activity here > ?Have you tried to submit any of your work to PyPy yet? > yes: I had spent lots of time for concatenate() (pypy developers said noone works on it) - and finally they have committed code for this func from other trunc. Things like this were with some other my proposed code for PyPy and all those days spent for it. > ?Perhaps you should have sent this message to pypy-dev instead? > I had explained them my point of view in mail list and irc channel, their answer was like "don't borther horses, why do you in a hurry? All will be done during several months", but I see it (porting whole numpy) definitely won't be done during the term. IIRC during ~ 2 months only ~10 new items were added to numpypy; also, lots of numpypy items, when calling, e.g. searchsorted, just raise NotImplementedError: wainting for interplevel routine, or don't work with high-dimensional arrays and/or some other corner cases. numpypy developers go (rather slowly) their own way, while I just propose temporary alternative, till proper PyPy-numpy implementation regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From numpy-discussion at maubp.freeserve.co.uk Thu May 24 09:37:39 2012 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Thu, 24 May 2012 14:37:39 +0100 Subject: [Numpy-discussion] Some numpy funcs for PyPy In-Reply-To: <69589.1337864845.13913483336144781312@ffe16.ukr.net> References: <6355.1337859149.6100681629467803648@ffe6.ukr.net> <69589.1337864845.13913483336144781312@ffe16.ukr.net> Message-ID: On Thu, May 24, 2012 at 2:07 PM, Dmitrey wrote: > > I had been subsribed IIRC for a couple of months I don't follow the PyPy IRC so that would explain it. I don't know how much they use that rather than their mailing list, but both seem a better place to discuss their handling or external contributions than on the numpy-discussion and scipy-user lists. Still, I hope you are able to make some contributions to numpypy, because so far I've also found PyPy's numpy implementation too limited for my usage. Regards, Peter From jniehof at lanl.gov Thu May 24 10:56:33 2012 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Thu, 24 May 2012 08:56:33 -0600 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> Message-ID: <4FBE4C21.8080904@lanl.gov> On 05/23/2012 05:31 PM, T J wrote: > It seems that there are a number of ways to check if an array is a view. > Do we have a preferred way in the API that is guaranteed to stay > available? Or are all of the various methods "here to stay"? We've settled on checking array.base, which I think was the outcome of a stackoverflow thread that I can't dig up. (I'll check with the guy who wrote the code.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From ben.root at ou.edu Thu May 24 11:04:55 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 24 May 2012 11:04:55 -0400 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: <4FBE4C21.8080904@lanl.gov> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Thu, May 24, 2012 at 10:56 AM, Jonathan T. Niehof wrote: > On 05/23/2012 05:31 PM, T J wrote: > > > It seems that there are a number of ways to check if an array is a view. > > Do we have a preferred way in the API that is guaranteed to stay > > available? Or are all of the various methods "here to stay"? > > We've settled on checking array.base, which I think was the outcome of a > stackoverflow thread that I can't dig up. (I'll check with the guy who > wrote the code.) > > Just as a quick word to the wise. I think I can recall a situation where this could be misleading. In particular, I think it had to do with boolean/fancy indexing of an array. In some cases, what you get is a view of the copy of the original data. So, if you simply check to see if it is a view, and then assume that because it is a view, it must be a view of the original data, then that assumption can come back and bite you in strange ways. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu May 24 11:10:18 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 24 May 2012 16:10:18 +0100 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: <4FBE4C21.8080904@lanl.gov> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Thu, May 24, 2012 at 3:56 PM, Jonathan T. Niehof wrote: > On 05/23/2012 05:31 PM, T J wrote: > >> It seems that there are a number of ways to check if an array is a view. >> Do we have a preferred way in the API that is guaranteed to stay >> available? Or are all of the various methods "here to stay"? > > We've settled on checking array.base, which I think was the outcome of a > stackoverflow thread that I can't dig up. (I'll check with the guy who > wrote the code.) The problem is that "is a view" isn't a very meaningful concept... checking .base will tell you whether writes to an array are likely to affect some object that existed before that array was created. But it doesn't tell you whether writes to that array can affect any *particular* other object (at least without a fair amount of groveling around the innards of both objects), and it can happen that an object has base == None yet writes to it will affect another object, and it can happen that an object has base != None and yet writes to it won't affect any object that was ever accessible to your code. AFAICT it's really these other questions that one would like to answer, and checking .base won't answer them. -- Nathaniel From robert.kern at gmail.com Thu May 24 12:52:09 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 24 May 2012 17:52:09 +0100 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Thu, May 24, 2012 at 4:10 PM, Nathaniel Smith wrote: > On Thu, May 24, 2012 at 3:56 PM, Jonathan T. Niehof wrote: >> On 05/23/2012 05:31 PM, T J wrote: >> >>> It seems that there are a number of ways to check if an array is a view. >>> Do we have a preferred way in the API that is guaranteed to stay >>> available? Or are all of the various methods "here to stay"? >> >> We've settled on checking array.base, which I think was the outcome of a >> stackoverflow thread that I can't dig up. (I'll check with the guy who >> wrote the code.) > > The problem is that "is a view" isn't a very meaningful concept... > checking .base will tell you whether writes to an array are likely to > affect some object that existed before that array was created. But it > doesn't tell you whether writes to that array can affect any > *particular* other object (at least without a fair amount of groveling > around the innards of both objects), and it can happen that an object > has base == None yet writes to it will affect another object, and it > can happen that an object has base != None and yet writes to it won't > affect any object that was ever accessible to your code. AFAICT it's > really these other questions that one would like to answer, and > checking .base won't answer them. numpy.may_share_memory() gets closer, but it can be defeated by certain striding patterns. At least, it is conservative and reports false positives but not false negatives. Implementing numpy.does_share_memory() correctly involves some number theory and hairy edge cases. (Hmm, now that I think about it, the edge cases are when the strides are 0 or negative. 0-stride axes can simply be removed, and I think we should be able to work back to a first item and flip the sign on the negative strides. The typical positive-stride solution can be found in an open source C++ global array code, IIRC. Double-hmmm...) -- Robert Kern From balarsen at lanl.gov Thu May 24 13:07:37 2012 From: balarsen at lanl.gov (Larsen, Brian A) Date: Thu, 24 May 2012 17:07:37 +0000 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: <4FBE4C21.8080904@lanl.gov> References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: This is the stack overflow discussion mentioned. http://stackoverflow.com/questions/9164269/can-you-tell-if-an-array-is-a-view-of-another I basically implemented the answer from SO. I feel like the "is" gives you a good handle on things since to be true they are actually the same location in memory. Brian On May 24, 2012, at 8:56 AM, Jonathan T. Niehof wrote: On 05/23/2012 05:31 PM, T J wrote: It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods "here to stay"? We've settled on checking array.base, which I think was the outcome of a stackoverflow thread that I can't dig up. (I'll check with the guy who wrote the code.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Brian A. Larsen ISR-1 Space Science and Applications Los Alamos National Laboratory PO Box 1663, MS-D466 Los Alamos, NM 87545 USA (For overnight add: SM-30, Bikini Atoll Road) Phone: 505-665-7691 Fax: 505-665-7395 email: balarsen at lanl.gov Correspondence / Technical data or Software Publicly Available -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu May 24 13:59:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 24 May 2012 18:59:38 +0100 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Thu, May 24, 2012 at 6:07 PM, Larsen, Brian A wrote: > This is the stack overflow discussion mentioned. > > http://stackoverflow.com/questions/9164269/can-you-tell-if-an-array-is-a-view-of-another > > I basically implemented the answer from SO. ?I feel like the "is" gives you > a good handle on things since to be true they are actually the same location > in memory. If using the current development version of numpy, that answer is actually wrong... if you do a = np.arange(10) b = a.view() c = b.view() then in the development version, c.base is a, not b. This is the source of some contention and confusion right now...: https://github.com/numpy/numpy/pull/280#issuecomment-5888154 In any case, if "b.base is a" is True, then you can be pretty certain that b and a share memory, but if it is False, it doesn't tell you much at all. AFAICT np.may_share_memory would be strictly more useful. -- Nathaniel From josef.pktd at gmail.com Thu May 24 15:20:12 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 May 2012 15:20:12 -0400 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Thu, May 24, 2012 at 1:59 PM, Nathaniel Smith wrote: > On Thu, May 24, 2012 at 6:07 PM, Larsen, Brian A wrote: >> This is the stack overflow discussion mentioned. >> >> http://stackoverflow.com/questions/9164269/can-you-tell-if-an-array-is-a-view-of-another >> >> I basically implemented the answer from SO. ?I feel like the "is" gives you >> a good handle on things since to be true they are actually the same location >> in memory. > > If using the current development version of numpy, that answer is > actually wrong... if you do > ?a = np.arange(10) > ?b = a.view() > ?c = b.view() > then in the development version, c.base is a, not b. This is the > source of some contention and confusion right now...: > ?https://github.com/numpy/numpy/pull/280#issuecomment-5888154 > > In any case, if "b.base is a" is True, then you can be pretty certain > that b and a share memory, but if it is False, it doesn't tell you > much at all. AFAICT np.may_share_memory would be strictly more useful. as example: I checked pandas recently and IIRC, I needed three .base to get a True >>> x = np.random.randn(4,5) >>> xdf = pa.DataFrame(data=x) >>> type(xdf[1]) >>> xdf[1].base is x False >>> xdf[1].base.base is x False >>> xdf[1].base.base.base is x True >>> np.may_share_memory(xdf[1], x) True >>> Josef > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From diehose at freenet.de Thu May 24 17:07:50 2012 From: diehose at freenet.de (diehose) Date: Thu, 24 May 2012 23:07:50 +0200 Subject: [Numpy-discussion] Fwd: Named dtype array: Difference between a[0]['name'] and a['name'][0]? In-Reply-To: <249AA706-CE3F-49A5-924C-6B101637EAA1@continuum.io> References: <5946941.oVP5cPdKPZ@wohnzimmer> <249AA706-CE3F-49A5-924C-6B101637EAA1@continuum.io> Message-ID: <3718221.nEJGS8FtgG@laptop> thanks a lot. I updated the question on stackoverflow and opened a ticket http://projects.scipy.org/numpy/ticket/2139 bj?rn Am Montag, 21. Mai 2012, 15:37:36 schrieb Travis Oliphant: This is the right place to ask, it's just that it can take time to get an answer because people who might know the answer may not have the time to respond immediately. The short answer is that this is not really a "normal" bug, but it could be considered a "design" bug (although the issues may not be straightforward to resolve). What that means is that it may not be changed in the short term --- and you should just use the first spelling. Structured arrays can be a confusing area of NumPy for several of reasons. You've constructed an example that touches on several of them. You have a data-type that is a "structure" array with one member ("tuple"). That member contains a 2-vector of integers. First of all, it is important to remember that with Python, doing a['tuple'] [0] = (1,2) is equivalent to b = a['tuple']; b[0] = (1,2). In like manner, a[0]['tuple'] = (1,2) is equivalent to b = a[0]; b['tuple'] = (1,2). To understand the behavior, we need to dissect both code paths and what happens. You built a (3,) array of those elements in 'a'. When you write b = a['tuple'] you should probably be getting a (3,) array of (2,)-integers, but as there is currently no formal dtype support for (n,)-integers as a general dtype in NumPy, you get back a (3,2) array of integers which is the closest thing that NumPy can give you. Setting the [0] row of this object via a['tuple'][0] = (1,2) works just fine and does what you would expect. On the other hand, when you type: b = a[0] you are getting back an array-scalar which is a particularly interesting kind of array scalar that can hold records. This new object is formally of type numpy.void and it holds a "scalar representation" of anything that fits under the "VOID" basic dtype. For some reason: b['tuple'] = [1,2] is not working. On my system I'm getting a different error: TypeError: object of type 'int' has no len() I think this should be filed as a bug on the issue tracker which is for the time being here: http://projects.scipy.org/numpy The problem is ultimately the void->copyswap function being called in voidtype_setfields if someone wants to investigate. I think this behavior should work. -Travis On May 21, 2012, at 1:50 PM, bmu wrote: dear all, can anybody tell me, why nobody is answering this question? is this the wrong place to ask? or does nobody know an answer? bj?rn From: bmu Subject: Named dtype array: Difference between a[0]['name'] and a['name'][0]? Date: May 20, 2012 6:45:03 AM CDT To: numpy-discussion at scipy.org I came acroos a question on stackoverflow (http://stackoverflow.com/q/9470604) and I am wondering if this is a bug import numpy as np dt = np.dtype([('tuple', (int, 2))]) a = np.zeros(3, dt) type(a['tuple'][0]) # ndarray type(a[0]['tuple']) # ndarray a['tuple'][0] = (1,2) # ok a[0]['tuple'] = (1,2) # ValueError: shape-mismatch on array construction Could somebody explain this behaviour (either in this mailing list or on stackoverflow)? bmu _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at gmail.com Fri May 25 07:46:31 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Fri, 25 May 2012 13:46:31 +0200 Subject: [Numpy-discussion] .max(0) on reshaped array returns inconsistent results. Message-ID: I'm seeing some strange behavior from .max() on a reshaped array in the current master, and wanted to raise it here to make sure it's not something uniquely broken in my setup. This code fails for me, though changing the context (adding a counter to the loop, or running under "python -i") sometimes prevents it from failing. This code doesn't fail under 1.6.2. --------------- import numpy as np b = np.array([0, 1, 2, 3, 4, 5], np.int64) a = b.reshape(3, 2) while True: np.testing.assert_array_equal(np.atleast_1d(np.array(a.max(0), np.float)), np.atleast_1d(np.array(a.max(0), np.float))) --------------- I spent several hours with valgrind trying to track down what was causing this, but had no luck. Perhaps someone with more knowledge of the numpy ufunc internals can track it down faster than me. I went ahead and filed a bug for it: http://projects.scipy.org/numpy/ticket/2144 From njs at pobox.com Fri May 25 07:52:07 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 25 May 2012 12:52:07 +0100 Subject: [Numpy-discussion] .max(0) on reshaped array returns inconsistent results. In-Reply-To: References: Message-ID: On Fri, May 25, 2012 at 12:46 PM, Thouis (Ray) Jones wrote: > I'm seeing some strange behavior from .max() on a reshaped array in > the current master, and wanted to raise it here to make sure it's not > something uniquely broken in my setup. > > This code fails for me, though changing the context (adding a counter > to the loop, or running under "python -i") sometimes prevents it from > failing. ?This code doesn't fail under 1.6.2. > > --------------- > import numpy as np > > b = np.array([0, 1, 2, 3, 4, 5], np.int64) > a = b.reshape(3, 2) > > while True: > ? ?np.testing.assert_array_equal(np.atleast_1d(np.array(a.max(0), np.float)), > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?np.atleast_1d(np.array(a.max(0), np.float))) > --------------- > > I spent several hours with valgrind trying to track down what was > causing this, but had no luck. ?Perhaps someone with more knowledge of > the numpy ufunc internals can track it down faster than me. What do you get, if not the expected value? And are the calls to atleast_1d, np.array, etc., necessary to trigger the problem, or will just plain a.max(0) do it? -- Nathaniel From thouis.jones at curie.fr Fri May 25 08:07:46 2012 From: thouis.jones at curie.fr (Thouis Jones) Date: Fri, 25 May 2012 14:07:46 +0200 Subject: [Numpy-discussion] .max(0) on reshaped array returns inconsistent results. In-Reply-To: References: Message-ID: On Fri, May 25, 2012 at 1:52 PM, Nathaniel Smith wrote: > On Fri, May 25, 2012 at 12:46 PM, Thouis (Ray) Jones wrote: >> I'm seeing some strange behavior from .max() on a reshaped array in >> the current master, and wanted to raise it here to make sure it's not >> something uniquely broken in my setup. >> >> This code fails for me, though changing the context (adding a counter >> to the loop, or running under "python -i") sometimes prevents it from >> failing. ?This code doesn't fail under 1.6.2. >> >> --------------- >> import numpy as np >> >> b = np.array([0, 1, 2, 3, 4, 5], np.int64) >> a = b.reshape(3, 2) >> >> while True: >> ? ?np.testing.assert_array_equal(np.atleast_1d(np.array(a.max(0), np.float)), >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?np.atleast_1d(np.array(a.max(0), np.float))) >> --------------- >> >> I spent several hours with valgrind trying to track down what was >> causing this, but had no luck. ?Perhaps someone with more knowledge of >> the numpy ufunc internals can track it down faster than me. > > What do you get, if not the expected value? And are the calls to > atleast_1d, np.array, etc., necessary to trigger the problem, or will > just plain a.max(0) do it? AssertionError: Arrays are not equal (mismatch 100.0%) x: array([ 4., 5.]) y: array([ 4.31441533e+09, 4.31441402e+09]) I don't seem to be able to reproduce with just a.max(0) or np.array(a.max(0), np.float), but since it seems to be very unstable to other changes in the code, I'll keep trying to find out if I can make those simpler versions crash. Ray Jones From thouis.jones at curie.fr Fri May 25 08:17:42 2012 From: thouis.jones at curie.fr (Thouis Jones) Date: Fri, 25 May 2012 14:17:42 +0200 Subject: [Numpy-discussion] .max(0) on reshaped array returns inconsistent results. In-Reply-To: References: Message-ID: On Fri, May 25, 2012 at 2:07 PM, Thouis Jones wrote: > I don't seem to be able to reproduce with just a.max(0) or > np.array(a.max(0), np.float), but since it seems to be very unstable > to other changes in the code, I'll keep trying to find out if I can > make those simpler versions crash. By the way, the strange phrasing comes from these lines in histogramdd(): smin = atleast_1d(array(sample.min(0), float)) smax = atleast_1d(array(sample.max(0), float)) Which is where I encountered the bug in the numpy tests. Ray Jones From robert.kern at gmail.com Fri May 25 09:17:13 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 25 May 2012 14:17:13 +0100 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Thu, May 24, 2012 at 5:52 PM, Robert Kern wrote: > (Hmm, now that I think about it, the edge cases are when the strides > are 0 or negative. 0-stride axes can simply be removed, and I think we > should be able to work back to a first item and flip the sign on the > negative strides. The typical positive-stride solution can be found in > an open source C++ global array code, IIRC. Double-hmmm...) Except that it's still NP-complete. -- Robert Kern From d.s.seljebotn at astro.uio.no Fri May 25 10:07:48 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 25 May 2012 16:07:48 +0200 Subject: [Numpy-discussion] Checking for views In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: <4FBF9234.3000602@astro.uio.no> On 05/25/2012 03:17 PM, Robert Kern wrote: > On Thu, May 24, 2012 at 5:52 PM, Robert Kern wrote: > >> (Hmm, now that I think about it, the edge cases are when the strides >> are 0 or negative. 0-stride axes can simply be removed, and I think we >> should be able to work back to a first item and flip the sign on the >> negative strides. The typical positive-stride solution can be found in >> an open source C++ global array code, IIRC. Double-hmmm...) > > Except that it's still NP-complete. > Well, I guess N would be the number of dimensions, so that by itself doesn't tell us all that much. Question is if the worst case is no better than the trivial O(number of elements in the matrices), which would be bad. Dag From damienlmoore at gmail.com Fri May 25 10:19:34 2012 From: damienlmoore at gmail.com (Damien Moore) Date: Fri, 25 May 2012 10:19:34 -0400 Subject: [Numpy-discussion] Retrieving and flattening lower triangular components of inner axes of 3d array Message-ID: Hi List, I have an array, x, with dimensions (K,M,M). I would like to compute a new array, y ~ (K,M*(M+1)/2), such that y[k] = numpy.tril(x[k]).flatten() for each k = 0,..,K-1 Is there a way to do this without looping? I was trying with tril_indices but couldn't quite get the logic right. Thanks, dm From njs at pobox.com Fri May 25 10:55:01 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 25 May 2012 15:55:01 +0100 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On May 25, 2012 2:21 PM, "Robert Kern" wrote: > > On Thu, May 24, 2012 at 5:52 PM, Robert Kern wrote: > > > (Hmm, now that I think about it, the edge cases are when the strides > > are 0 or negative. 0-stride axes can simply be removed, and I think we > > should be able to work back to a first item and flip the sign on the > > negative strides. The typical positive-stride solution can be found in > > an open source C++ global array code, IIRC. Double-hmmm...) > > Except that it's still NP-complete. Huh, is it really? I'm pretty sure checking the existence of a solution to a linear Diophantine equation is cheap, but I guess figuring out whether it falls within the "shape" bounds is less obvious... -- Nathaniel From chris at simplistix.co.uk Fri May 25 11:17:27 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 25 May 2012 16:17:27 +0100 Subject: [Numpy-discussion] indexes in an array where value is greater than 1? Message-ID: <4FBFA287.30903@simplistix.co.uk> Hi All, I have an array: arrrgh = numpy.zeros(100000000) A sparse collection of elements will have values greater than zero: arrrgh[9999] = 2 arrrgh[3453453] =42 The *wrong* way to do this is: for i in xrange(len(arrrgh)): if arrrgh[i] > 1: print i What's the right way? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From ben.root at ou.edu Fri May 25 11:21:08 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 25 May 2012 11:21:08 -0400 Subject: [Numpy-discussion] indexes in an array where value is greater than 1? In-Reply-To: <4FBFA287.30903@simplistix.co.uk> References: <4FBFA287.30903@simplistix.co.uk> Message-ID: On Fri, May 25, 2012 at 11:17 AM, Chris Withers wrote: > Hi All, > > I have an array: > > arrrgh = numpy.zeros(100000000) > > A sparse collection of elements will have values greater than zero: > > arrrgh[9999] = 2 > arrrgh[3453453] =42 > > The *wrong* way to do this is: > > for i in xrange(len(arrrgh)): > if arrrgh[i] > 1: > print i > > What's the right way? > > Chris > > np.nonzero(arrrgh > 1) Note, it returns a list of lists, one for each dimension of the input array. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at gmail.com Fri May 25 11:30:22 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Fri, 25 May 2012 17:30:22 +0200 Subject: [Numpy-discussion] .max(0) on reshaped array returns inconsistent results. In-Reply-To: References: Message-ID: I've bisected it down to this commit: https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1 This exercises it consistently for me: while True; do python -m nose.core ../numpy.bisect/numpy/lib/tests/test_function_base.py:TestHistogramdd --pdb --pdb-failures; done It happens at HEAD in Nathan's separate-maskna branch, as well. Ray Jones From thouis at gmail.com Fri May 25 11:39:09 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Fri, 25 May 2012 17:39:09 +0200 Subject: [Numpy-discussion] .max(0) on reshaped array returns inconsistent results. In-Reply-To: References: Message-ID: On May 25, 2012 5:30 PM, "Thouis (Ray) Jones" wrote: > It happens at HEAD in Nathan's separate-maskna branch, as well. Sorry, Nathaniel's branch. My fingers went into autopilot. Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri May 25 11:55:12 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 25 May 2012 16:55:12 +0100 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Fri, May 25, 2012 at 3:55 PM, Nathaniel Smith wrote: > On May 25, 2012 2:21 PM, "Robert Kern" wrote: >> >> On Thu, May 24, 2012 at 5:52 PM, Robert Kern wrote: >> >> > (Hmm, now that I think about it, the edge cases are when the strides >> > are 0 or negative. 0-stride axes can simply be removed, and I think we >> > should be able to work back to a first item and flip the sign on the >> > negative strides. The typical positive-stride solution can be found in >> > an open source C++ global array code, IIRC. Double-hmmm...) >> >> Except that it's still NP-complete. > > Huh, is it really? I'm pretty sure checking the existence of a > solution to a linear Diophantine equation is cheap, but I guess > figuring out whether it falls within the "shape" bounds is less > obvious... I believe that's what this is telling me: http://permalink.gmane.org/gmane.comp.gcc.fortran/11797 -- Robert Kern From robert.kern at gmail.com Fri May 25 11:59:52 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 25 May 2012 16:59:52 +0100 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Fri, May 25, 2012 at 3:55 PM, Nathaniel Smith wrote: > On May 25, 2012 2:21 PM, "Robert Kern" wrote: >> >> On Thu, May 24, 2012 at 5:52 PM, Robert Kern wrote: >> >> > (Hmm, now that I think about it, the edge cases are when the strides >> > are 0 or negative. 0-stride axes can simply be removed, and I think we >> > should be able to work back to a first item and flip the sign on the >> > negative strides. The typical positive-stride solution can be found in >> > an open source C++ global array code, IIRC. Double-hmmm...) >> >> Except that it's still NP-complete. > > Huh, is it really? I'm pretty sure checking the existence of a > solution to a linear Diophantine equation is cheap, but I guess > figuring out whether it falls within the "shape" bounds is less > obvious... If both positive and negative values are allowed, then there is a polynomial-time algorithm to solve the linear Diophantine equation, but bounding the possible values renders it NP-complete. When you go down to {0,1} as the only allowable values, it becomes the SUBSET-SUM problem. -- Robert Kern From njs at pobox.com Fri May 25 12:59:39 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 25 May 2012 17:59:39 +0100 Subject: [Numpy-discussion] Checking for views (was: Should arr.diagonal() return a copy or aview?) In-Reply-To: References: <71175623-F277-4FCE-BEC9-FFC29988F5FE@continuum.io> <0DE410A3-D86D-4A49-9D29-9F066F6154FA@continuum.io> <9612341A-EE88-4C14-8A26-D1FCD075F60B@continuum.io> <4FBD41D7.5040406@astro.uio.no> <4FBD4220.6060801@astro.uio.no> <1D0533E8-FB25-46BA-8549-DFAF7F669CCC@continuum.io> <1337814966.6811.45.camel@MOSES.grc.nasa.gov> <4FBE4C21.8080904@lanl.gov> Message-ID: On Fri, May 25, 2012 at 4:59 PM, Robert Kern wrote: > On Fri, May 25, 2012 at 3:55 PM, Nathaniel Smith wrote: >> On May 25, 2012 2:21 PM, "Robert Kern" wrote: >>> >>> On Thu, May 24, 2012 at 5:52 PM, Robert Kern wrote: >>> >>> > (Hmm, now that I think about it, the edge cases are when the strides >>> > are 0 or negative. 0-stride axes can simply be removed, and I think we >>> > should be able to work back to a first item and flip the sign on the >>> > negative strides. The typical positive-stride solution can be found in >>> > an open source C++ global array code, IIRC. Double-hmmm...) >>> >>> Except that it's still NP-complete. >> >> Huh, is it really? I'm pretty sure checking the existence of a >> solution to a linear Diophantine equation is cheap, but I guess >> figuring out whether it falls within the "shape" bounds is less >> obvious... > > If both positive and negative values are allowed, then there is a > polynomial-time algorithm to solve the linear Diophantine equation, > but bounding the possible values renders it NP-complete. When you go > down to {0,1} as the only allowable values, it becomes the SUBSET-SUM > problem. Right. I suspect it's still pretty practical to solve for many of the arrays we care about (strides that are multiples of each other, etc.); many NP-hard problems are easy in the typical case, and for a lot of the cases we care about (e.g. disjoint slices of a contiguous array) solving the Diophantine equation will show that the bounds are irrelevant and collisions can never occur. Oh well, fortunately nothing depends on this :-). - N From pav at iki.fi Fri May 25 13:30:46 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 25 May 2012 19:30:46 +0200 Subject: [Numpy-discussion] .max(0) on reshaped array returns inconsistent results. In-Reply-To: References: Message-ID: 25.05.2012 13:46, Thouis (Ray) Jones kirjoitti: > I'm seeing some strange behavior from .max() on a reshaped array in > the current master, and wanted to raise it here to make sure it's not > something uniquely broken in my setup. > > This code fails for me, though changing the context (adding a counter > to the loop, or running under "python -i") sometimes prevents it from > failing. This code doesn't fail under 1.6.2. Try using "git bisect" to find the the first failing commit: 1. git bisect start master v1.6.2 2. Rebuild, and run test 3. Pick one of the following according to the result: git bisect good # test OK git bisect bad # test fails with this error git bisect skip # doesn't build, or some other error 4. Goto 2, until Git tells you what's the first bad commit. Abort with "git bisect reset". If you are on unix, the following rig can help automate this: https://github.com/pv/scipy-build-makefile -- Pauli Virtanen From chaoyuejoy at gmail.com Sat May 26 07:51:24 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sat, 26 May 2012 13:51:24 +0200 Subject: [Numpy-discussion] why Segmentation fault (core dumped)? Message-ID: Dear all, Previously I am able to run a script on our server but now it gave me a Segmentation fault (core dumped) error. try I tried the script with same type of netcdf file but with much smaller file size and it works. So I think the error is related with memory stuff. I guess it's because our system administrator make some change somewhere and that cause my problem? the file size that cause the error to appear is 2.6G (in the script I read this file with NetCDF4 to numpy array and make some manipulation), the small one without error is only 48M. I tried to use "limit" command to list the following, then I change stacksize to unlimited and the problem still occurs. ychao at obelix2 - ...CRU_NEW - 12 >limit cputime unlimited filesize unlimited datasize unlimited stacksize 10240 kbytes coredumpsize 0 kbytes memoryuse unlimited vmemoryuse unlimited descriptors 1024 memorylocked 64 kbytes maxproc 1024 would anybody be able to give me a short explanation or direct me to some webpage which can help to understand the problem? thanks et cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jswhit at fastmail.fm Sat May 26 11:04:23 2012 From: jswhit at fastmail.fm (Jeff Whitaker) Date: Sat, 26 May 2012 09:04:23 -0600 Subject: [Numpy-discussion] why Segmentation fault (core dumped)? In-Reply-To: References: Message-ID: <4FC0F0F7.1000707@fastmail.fm> On 5/26/12 5:51 AM, Chao YUE wrote: > Dear all, > > Previously I am able to run a script on our server but now it gave me > a Segmentation fault (core dumped) error. > try I tried the script with same type of netcdf file but with much > smaller file size and it works. So I think the error is related with > memory stuff. > I guess it's because our system administrator make some change > somewhere and that cause my problem? > the file size that cause the error to appear is 2.6G (in the script I > read this file with NetCDF4 to numpy array and make some manipulation), > the small one without error is only 48M. Chao: Without seeing your script, there's not much I can say. I suggest opening an issue at netcdf4-python.google.com, including your script as an attachment. You'll probably have to post the data file somewhere (dropbox perhaps?) so I can run the script that triggers the segfault. -Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.raybaut at gmail.com Sat May 26 15:42:01 2012 From: pierre.raybaut at gmail.com (Pierre Raybaut) Date: Sat, 26 May 2012 21:42:01 +0200 Subject: [Numpy-discussion] ANN: Spyder v2.1.10 Message-ID: Hi all, On the behalf of Spyder's development team (http://code.google.com/p/spyderlib/people/list), I'm pleased to announce that Spyder v2.1.10 has been released and is available for Windows XP/Vista/7, GNU/Linux and MacOS X: http://code.google.com/p/spyderlib/ This is a pure maintenance release -- a lot of bugs were fixed since v2.1.9: http://code.google.com/p/spyderlib/wiki/ChangeLog Spyder is a free, open-source (MIT license) interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features. Originally designed to provide MATLAB-like features (integrated help, interactive console, variable explorer with GUI-based editors for dictionaries, NumPy arrays, ...), it is strongly oriented towards scientific computing and software development. Thanks to the `spyderlib` library, Spyder also provides powerful ready-to-use widgets: embedded Python console (example: http://packages.python.org/guiqwt/_images/sift3.png), NumPy array editor (example: http://packages.python.org/guiqwt/_images/sift2.png), dictionary editor, source code editor, etc. Description of key features with tasty screenshots can be found at: http://code.google.com/p/spyderlib/wiki/Features On Windows platforms, Spyder is also available as a stand-alone executable (don't forget to disable UAC on Vista/7). This all-in-one portable version is still experimental (for example, it does not embed sphinx -- meaning no rich text mode for the object inspector) but it should provide a working version of Spyder for Windows platforms without having to install anything else (except Python 2.x itself, of course). Don't forget to follow Spyder updates/news: * on the project website: http://code.google.com/p/spyderlib/ * and on our official blog: http://spyder-ide.blogspot.com/ Last, but not least, we welcome any contribution that helps making Spyder an efficient scientific development/computing environment. Join us to help creating your favourite environment! (http://code.google.com/p/spyderlib/wiki/NoteForContributors) Enjoy! -Pierre From fperez.net at gmail.com Sun May 27 03:08:35 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 27 May 2012 00:08:35 -0700 Subject: [Numpy-discussion] [ANN] Cell magics in IPython master Message-ID: Hi folks, [ Sorry for the slightly off-topic post, but I know that a number of people on this list have long wanted more seamless ways to integrate with tools like Cython and R, and you may not necessarily follow the ipython lists...] I'm excited to report that we now have cell magics in IPython... PR 1732 [1] has just been merged [2], which implements the design discussed in IPEP 1 [3]. This is probably one of the largest PRs we've had so far, with over 100 commits, over 100 comments and a diff that's almost 11000 lines long (a lot of it moving code around, obviously it's not all new code). But it brings two very important thigns: 1) a refactor of the magic system to finally remove the old mixin class we'd had since the very first days of IPython in 2001. This is a cleanup I've been wanting to do for over 10 years! The new setup makes the magic system have a very clean api, that is easy to use both for the implementation of core features and for users to create their own magics. 2) the new concept of cell magics: these are magics that get not only the line they're on, but the entire cell body as well. And while these are most naturally used in the notebook, as you would expect we've built them at the core of IPython, so you can use them with all the clients (terminal, qt console, notebook). For example, this is a Cython magic that Brian just prototyped out (we'll have a production version of it soon included). Note that this was copied *from a regular text terminal*, not from the notebook: In [3]: from IPython.core.magic import register_line_cell_magic In [4]: @register_line_cell_magic ...: def cython(line, cell): ...: """Compile and import a cell as a .pyx file.""" ...: import sys ...: from importlib import import_module ...: module = line.strip() ...: fname = module + '.pyx' ...: with open(fname,'w') as f: ...: f.write(cell) ...: if 'pyximport' not in sys.modules: ...: import pyximport ...: pyximport.install(reload_support=True) ...: globals()[module] = import_module(module) ...: In [5]: %%cython bam ...: def f(x): ...: return 2.0*x ...: In [6]: bam.f(10) Out[6]: 20.0 In a similar spirit, Jonathan Taylor recently created one to call R transparently in the notebook: https://github.com/jonathan-taylor/Rmagic This one hasn't been fully updated to the final API, but the core code is there and now it should be a trivial matter to update it. I want to thank everyone who pitched in with ideas during the discussion and review period, and I hope you'll all enjoy this and come up with great ways to use the system. For now, you can see how the system works by playing with %%timeit and %%prun, the only two builtins that I extended to work also as cell magics. For more details, see the documentation where we've added also a long new section with details and examples of how to create your own [4]. Cheers, f [1] https://github.com/ipython/ipython/pull/1732 [2] https://github.com/ipython/ipython/commit/61eb2ffeebb91a94fe9befe2c30e7839781ddc52 [2] https://github.com/ipython/ipython/issues/1611 [3] http://ipython.org/ipython-doc/dev/interactive/reference.html#magic-command-system From chris at simplistix.co.uk Sun May 27 04:19:19 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Sun, 27 May 2012 09:19:19 +0100 Subject: [Numpy-discussion] indexes in an array where value is greater than 1? In-Reply-To: References: <4FBFA287.30903@simplistix.co.uk> Message-ID: <4FC1E387.2060904@simplistix.co.uk> On 25/05/2012 16:21, Benjamin Root wrote: > > np.nonzero(arrrgh > 1) Did you mean np.where(arrrgh > 1)? I didn't know you could use np.nonzero in the way your describe? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From chaoyuejoy at gmail.com Sun May 27 05:14:33 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sun, 27 May 2012 11:14:33 +0200 Subject: [Numpy-discussion] indexes in an array where value is greater than 1? In-Reply-To: <4FC1E387.2060904@simplistix.co.uk> References: <4FBFA287.30903@simplistix.co.uk> <4FC1E387.2060904@simplistix.co.uk> Message-ID: for me, np.nonzero() and np.where() both work. It seems they have same function. chao 2012/5/27 Chris Withers > On 25/05/2012 16:21, Benjamin Root wrote: > > > > np.nonzero(arrrgh > 1) > > Did you mean np.where(arrrgh > 1)?for > I didn't know you could use np.nonzero in the way your describe? > > Chris > > -- > Simplistix - Content Management, Batch Processing & Python Consulting > - http://www.simplistix.co.uk > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun May 27 07:11:55 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 27 May 2012 12:11:55 +0100 Subject: [Numpy-discussion] [ANN] Cell magics in IPython master In-Reply-To: References: Message-ID: Hi Fernando, Excellent work! On Sun, May 27, 2012 at 8:08 AM, Fernando Perez wrote: > In a similar spirit, Jonathan Taylor recently created one to call R > transparently in the notebook: > > https://github.com/jonathan-taylor/Rmagic > > This one hasn't been fully updated to the final API, but the core code > is there and now it should be a trivial matter to update it. You guys are probably aware, but just in case, note that there's an earlier (quite featureful) version of this idea included in the old rnumpy code: https://bitbucket.org/njs/rnumpy/wiki/IPython_integration https://bitbucket.org/njs/rnumpy/src/b0f03e06aa0f/ipy_rnumpy.py Unfortunately it has bit-rotten some (written against rpy2 2.0.x and IPython 0.10ish, uses prefilters instead of %magic because %magic's couldn't do what I wanted), but it might be useful for code or inspiration. -- Nathaniel From chaoyuejoy at gmail.com Sun May 27 12:27:47 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sun, 27 May 2012 18:27:47 +0200 Subject: [Numpy-discussion] why Segmentation fault (core dumped)? In-Reply-To: <4FC0F0F7.1000707@fastmail.fm> References: <4FC0F0F7.1000707@fastmail.fm> Message-ID: Dear Jeff, Thanks a lot for your reply. I think it might be related with the memory management on our sever. But anyway, as you suggested, I open an issue on netcdf4-python.google.co m. you can find the data and script on ftp://ftp.cea.fr/incoming/y2k01/chaoyue/. The cal_cmi_big.py gave the core dumped error, in which I tried to operate with big files (~2G). The cal_cmi_small.py works fine, which handles data only ~25M (a subset of big files). I used a small function (which use NetCDF4) in the script to read and write nc data with NetCDF4. You can also find this function in ncfunc.py. I tested all the script and data before I upload on our ftp. thanks again for your help, cheers, Chao 2012/5/26 Jeff Whitaker > On 5/26/12 5:51 AM, Chao YUE wrote: > > Dear all, > > Previously I am able to run a script on our server but now it gave me a > Segmentation fault (core dumped) error. > try I tried the script with same type of netcdf file but with much smaller > file size and it works. So I think the error is related with memory stuff. > I guess it's because our system administrator make some change somewhere > and that cause my problem? > the file size that cause the error to appear is 2.6G (in the script I read > this file with NetCDF4 to numpy array and make some manipulation), > the small one without error is only 48M. > > > Chao: Without seeing your script, there's not much I can say. I suggest > opening an issue at netcdf4-python.google.com, including your script as > an attachment. You'll probably have to post the data file somewhere > (dropbox perhaps?) so I can run the script that triggers the segfault. > > -Jeff > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Sun May 27 12:31:13 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sun, 27 May 2012 18:31:13 +0200 Subject: [Numpy-discussion] why Segmentation fault (core dumped)? In-Reply-To: References: <4FC0F0F7.1000707@fastmail.fm> Message-ID: Just one more sentence. We are using version 0.9.7 on our server. when I tried to use: f=nc.Dataset('file.nc') to open the file (either big or small) within ipython, it works fine. Then when I tried to do %run cal_cmi_big.py, core dumped error. but %run cal_cmi_small.py works fine. The cal_cmi_big.py did work several days ago, I still have the file generated by this script in my directory. cheers, Chao 2012/5/27 Chao YUE > Dear Jeff, > > Thanks a lot for your reply. I think it might be related with the memory > management on our sever. But anyway, as you suggested, I open an issue on > netcdf4-python.google.co m. you can > find the data and script on ftp://ftp.cea.fr/incoming/y2k01/chaoyue/. The > cal_cmi_big.py gave the core dumped error, in which I tried to operate with > big files (~2G). The cal_cmi_small.py works fine, which handles data only > ~25M (a subset of big files). I used a small function (which use NetCDF4) > in the script to read and write nc data with NetCDF4. You can also find > this function in ncfunc.py. I tested all the script and data before I > upload on our ftp. > > thanks again for your help, > > cheers, > > Chao > > > 2012/5/26 Jeff Whitaker > >> On 5/26/12 5:51 AM, Chao YUE wrote: >> >> Dear all, >> >> Previously I am able to run a script on our server but now it gave me a >> Segmentation fault (core dumped) error. >> try I tried the script with same type of netcdf file but with much >> smaller file size and it works. So I think the error is related with memory >> stuff. >> I guess it's because our system administrator make some change somewhere >> and that cause my problem? >> the file size that cause the error to appear is 2.6G (in the script I >> read this file with NetCDF4 to numpy array and make some manipulation), >> the small one without error is only 48M. >> >> >> Chao: Without seeing your script, there's not much I can say. I suggest >> opening an issue at netcdf4-python.google.com, including your script as >> an attachment. You'll probably have to post the data file somewhere >> (dropbox perhaps?) so I can run the script that triggers the segfault. >> >> -Jeff >> > > > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sun May 27 17:06:10 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 27 May 2012 14:06:10 -0700 Subject: [Numpy-discussion] [ANN] Cell magics in IPython master In-Reply-To: References: Message-ID: Hi Nathaniel, On Sun, May 27, 2012 at 4:11 AM, Nathaniel Smith wrote: > You guys are probably aware, but just in case, note that there's an > earlier (quite featureful) version of this idea included in the old > rnumpy code: > ?https://bitbucket.org/njs/rnumpy/wiki/IPython_integration > ?https://bitbucket.org/njs/rnumpy/src/b0f03e06aa0f/ipy_rnumpy.py > > Unfortunately it has bit-rotten some (written against rpy2 2.0.x and > IPython 0.10ish, uses prefilters instead of %magic because %magic's > couldn't do what I wanted), but it might be useful for code or > inspiration. Actually I wasn't, sorry about that! I just passed along the info to Jonathan, as he said he'd try to finish up the R magic work and submit it for inclusion in IPython soon. Thanks for pointing this out. Cheers, f From d.s.seljebotn at astro.uio.no Sun May 27 17:27:19 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 27 May 2012 23:27:19 +0200 Subject: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive In-Reply-To: References: <4FB58185.7070500@astro.uio.no> Message-ID: <4FC29C37.6090803@astro.uio.no> On 05/18/2012 01:48 PM, mark florisson wrote: > On 17 May 2012 23:53, Dag Sverre Seljebotn wrote: >> I'm repeating myself a bit, but my previous thread of this ended up >> being about something else, and also since then I've been on an >> expedition to the hostile waters of python-dev. >> >> I'm crazy enough to believe that I'm proposing a technical solution to >> alleviate the problems we've faced as a community the past year. No, >> this will NOT be about NA, and certainly not governance, but do please >> allow me one paragraph of musings before the meaty stuff. >> >> I believe the Achilles heel of NumPy is the C API and the PyArrayObject. >> The reliance we all have on the NumPy C API means there can in practice >> only be one "array" type per Python process. This makes people *very* >> afraid of creative forking or new competing array libraries (since they >> just can't live in parallel -- like Cython and Pyrex can!), and every >> new feature has to go into ndarray to fully realise itself. This in turn >> means that experimentation with new features has to happen within one or >> a few release cycles, it cannot happen in the wild and by competition >> and by seeing what works over the course of years before finally making >> it into upstream. Finally, if any new great idea can really only be >> implemented decently if it also impacts thousands of users...that's bad >> both for morale and developer recruitment. >> >> The meat: >> >> There's already of course been work on making the NumPy C API work >> through an indirection layer to make a more stable ABI. This is about >> changing the ideas of how that indirection should happen, so that you >> could in theory implement the C API independently of NumPy. >> >> You could for instance make a "mini-NumPy" that only contains the bare >> essentials, and load that in the same process as the real NumPy, and use >> the C API against objects from both libraries. >> >> I'll assume that we can get a PEP through by waving a magic wand, since >> that makes it easier to focus on essentials. There's many ugly or less >> ugly hacks to make it work on any existing CPython [1], and they >> wouldn't be so ugly if there's PEP blessing for the general idea. >> >> Imagine if PyTypeObject grew an extra pointer "tp_customslots", which >> pointed to an array of these: >> >> typedef struct { >> unsigned long tpe_id; >> void *tpe_data; >> } PyTypeObjectCustomSlot; >> >> The ID space is partitioned to anyone who asks, and NumPy is given a >> large chunk. To insert a "custom slot", you stick it in this list. And >> you search it linearly for, say, PYTYPE_CUSTOM_NUMPY_SLOT (each type >> will typically have 0-3 entries so the search is very fast). >> >> I've benchmarked something very similar recently, and the overhead in a >> "hot" situation is on the order of 4-6 cycles. (As for cache, you can at >> least stick the slot array right next to the type object in memory.) >> >> Now, a NumPy array would populate this list with 1-2 entries pointing to >> tables of function pointers for the NumPy C API. This lookup through the >> PyTypeObject would in part replace the current import_array() mechanism. >> >> I'd actually propose two such custom slots for ndarray for starters: >> >> a) One PEP 3118-like binary description that exposes raw data pointers >> (without the PEP 3118 red tape) >> >> b) A function pointer table for a suitable subset of the NumPy C API >> (obviously not array construction and so on) >> >> The all-important PyArray_DATA/DIMS/... would be macros that try for a) >> first, but fall back to b). Things like PyArray_Check would actually >> check for support of these slots, "duck typing", rather than the Python >> type (of course, this could only be done at a major revision like NumPy >> 2.0 or 3.0). >> >> The overhead should be on the order of 5 cycles per C API call. That >> should be fine for anything but the use of PyArray_DATA inside a tight >> loop (which is a bad idea anyway). >> >> For now I just want to establish if there's support for this general >> idea, and see if I can get some weight behind a PEP (and ideally a >> co-author), which would make this a general approach and something more >> than an ugly NumPy specific hack. We'd also have good use for such a PEP >> in Cython (and, I believe, Numba/SciPy in CEP 1000). > > Well, you have my vote, but you already knew that. I'd also be willing > to co-author any PEP etc, but I'm sensing it may be more useful to > have support from people from different projects. Personally, I think > if this is to succeed, we first need to fix the design to work for > subclasses (I think one may just want to memcpy the interface > information over to the subclass, e.g. through a convenience function > that allows one to add more as well). If we have a solid idea of the > technical implementation, we should actually implement it and present > the benchmarks, comparing the results to capsules as attributes (and > to the _PyType_Lookup approach). Unless there's any holes in my fresh metaclass implementation, I think that is good enough that we can wait a year and get actual adoption before pushing for a PEP. That would also make the PEP a lot stronger. I do believe it should happen eventually though. Here's my post on the Cython list reposted to this list: """ So I finally got around to implementing this: https://github.com/dagss/pyextensibletype Documentation now in a draft in the NumFOCUS SEP repo, which I believe is a better place to store cross-project standards like this. (The NumPy docstring standard will be SEP 100). https://github.com/numfocus/sep/blob/master/sep200.rst Summary: - No common runtime dependency - 1 ns overhead per lookup (that's for the custom slot *alone*, no fast-callable signature matching or similar) - Slight annoyance: Types that want to use the metaclass must be a PyHeapExtensibleType, to make the binary layout work with how CPython makes subclasses from Python scripts My conclusion: I think the metaclass approach should work really well. """ Dag From ben.root at ou.edu Sun May 27 20:23:39 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 27 May 2012 20:23:39 -0400 Subject: [Numpy-discussion] indexes in an array where value is greater than 1? In-Reply-To: References: <4FBFA287.30903@simplistix.co.uk> <4FC1E387.2060904@simplistix.co.uk> Message-ID: On Sunday, May 27, 2012, Chao YUE wrote: > for me, np.nonzero() and np.where() both work. It seems they have same > function. > > chao They are not identical. Nonzeros is for indices. The where function is really meant for a different purpose, but special-cases for this call signature. Ben Root > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhansen at gmail.com Mon May 28 06:15:48 2012 From: mhansen at gmail.com (Mike Hansen) Date: Mon, 28 May 2012 03:15:48 -0700 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x Message-ID: Hello, In trying to upgrade NumPy within Sage, we notices some differences in behavior between 1.5 and 1.6. In particular, in 1.5, we have sage: f = 0.5 sage: f.__array_interface__ {'typestr': '=f8'} sage: numpy.array(f) array(0.5) sage: numpy.array(float(f)) array(0.5) In 1.6, we get the following, sage: f = 0.5 sage: f.__array_interface__ {'typestr': '=f8'} sage: numpy.array(f) array(0.500000000000000, dtype=object) This seems to be do to the changes in PyArray_FromAny introduced in https://github.com/mwhansen/numpy/commit/2635398db3f26529ce2aaea4028a8118844f3c48 . In particular, _array_find_type used to be used to query our __array_interface__ attribute, and it no longer seems to work. Is there a way to get the old behavior with the current code? --Mike From travis at continuum.io Mon May 28 13:58:26 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 28 May 2012 12:58:26 -0500 Subject: [Numpy-discussion] [enhancement] sum_angle() and sum_polar() In-Reply-To: References: Message-ID: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> I didn't see anyone respond to this, but looking over his simple and elegant solution it seems like a useful addition to the 2-d functions available in NumPy as it works with any 2-d array (image or matrix) and does a transformation on the indices in order to organize the sum. It is not a general-purpose interpolating approach where the 2-d array is viewed as samples of an underlying continuous function. Are their other thoughts? -Travis On Mar 7, 2012, at 12:39 PM, Robert J?rdens wrote: > Hi everyone, > I am proposing to add the the two following functions to > numpy/lib/twodim_base.py: > > sum_angle() computes the sum of a 2-d array along an angled axis > sum_polar() computes the sum of a 2-d array along radial lines or > along azimuthal circles > > https://github.com/numpy/numpy/pull/230 > > Comments? > > When I was looking for a solution to these problems of calculating > special sums of 2-d arrays I could not find anything and it took me a > while to figure out a (hopefully) useful and consistent algorithm. > I can see how one would extend these to higher dimensions but that > would preclude using bincount() to do the heavy lifting. > Looking at some other functions, the doctests might need to be split > into real examples and unittests. > > Best, > > -- > Robert Jordens. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at googlemail.com Mon May 28 14:02:54 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 28 May 2012 20:02:54 +0200 Subject: [Numpy-discussion] [enhancement] sum_angle() and sum_polar() In-Reply-To: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> References: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> Message-ID: On Mon, May 28, 2012 at 7:58 PM, Travis Oliphant wrote: > I didn't see anyone respond to this, but looking over his simple and > elegant solution it seems like a useful addition to the 2-d functions > available in NumPy as it works with any 2-d array (image or matrix) and > does a transformation on the indices in order to organize the sum. > > It is not a general-purpose interpolating approach where the 2-d array is > viewed as samples of an underlying continuous function. > > Are their other thoughts? > > This was discussed (not finished yet) on scipy-dev: http://thread.gmane.org/gmane.comp.python.scientific.devel/16538/focus=16541. Ralf > -Travis > > > > On Mar 7, 2012, at 12:39 PM, Robert J?rdens wrote: > > > Hi everyone, > > I am proposing to add the the two following functions to > > numpy/lib/twodim_base.py: > > > > sum_angle() computes the sum of a 2-d array along an angled axis > > sum_polar() computes the sum of a 2-d array along radial lines or > > along azimuthal circles > > > > https://github.com/numpy/numpy/pull/230 > > > > Comments? > > > > When I was looking for a solution to these problems of calculating > > special sums of 2-d arrays I could not find anything and it took me a > > while to figure out a (hopefully) useful and consistent algorithm. > > I can see how one would extend these to higher dimensions but that > > would preclude using bincount() to do the heavy lifting. > > Looking at some other functions, the doctests might need to be split > > into real examples and unittests. > > > > Best, > > > > -- > > Robert Jordens. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon May 28 14:53:34 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 28 May 2012 13:53:34 -0500 Subject: [Numpy-discussion] [enhancement] sum_angle() and sum_polar() In-Reply-To: References: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> Message-ID: On May 28, 2012, at 1:02 PM, Ralf Gommers wrote: > > > On Mon, May 28, 2012 at 7:58 PM, Travis Oliphant wrote: > I didn't see anyone respond to this, but looking over his simple and elegant solution it seems like a useful addition to the 2-d functions available in NumPy as it works with any 2-d array (image or matrix) and does a transformation on the indices in order to organize the sum. > > It is not a general-purpose interpolating approach where the 2-d array is viewed as samples of an underlying continuous function. > > Are their other thoughts? > > This was discussed (not finished yet) on scipy-dev: http://thread.gmane.org/gmane.comp.python.scientific.devel/16538/focus=16541. > That is a useful discussion, but the question about whether this function should just go into NumPy is also of interest. There are arguments that it could go into NumPy, SciPy, or sckitis-image. I think going into scikits-image does not make sense because of their general applicability for more than just images and the fact that in the context of image-processing these functions *just* do nearest neighbor interpolation. I could see these functions going into scipy.ndimage but again because they are not necessarily just image processing functions, and the fact that they are so simple, perhaps they are best put into NumPy itself. -Travis > Ralf > > > -Travis > > > > On Mar 7, 2012, at 12:39 PM, Robert J?rdens wrote: > > > Hi everyone, > > I am proposing to add the the two following functions to > > numpy/lib/twodim_base.py: > > > > sum_angle() computes the sum of a 2-d array along an angled axis > > sum_polar() computes the sum of a 2-d array along radial lines or > > along azimuthal circles > > > > https://github.com/numpy/numpy/pull/230 > > > > Comments? > > > > When I was looking for a solution to these problems of calculating > > special sums of 2-d arrays I could not find anything and it took me a > > while to figure out a (hopefully) useful and consistent algorithm. > > I can see how one would extend these to higher dimensions but that > > would preclude using bincount() to do the heavy lifting. > > Looking at some other functions, the doctests might need to be split > > into real examples and unittests. > > > > Best, > > > > -- > > Robert Jordens. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon May 28 23:18:44 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 28 May 2012 23:18:44 -0400 Subject: [Numpy-discussion] 1.6.2 no more unique for rows Message-ID: https://github.com/numpy/numpy/commit/74b9f5eef8fac643bf9012dbb2ac6b4b19f46892 broke return_inverse for structured arrays, because of the use of mergesort I'm using structured dtypes to get uniques and return_inverse by rows >>> groups = np.random.randint(0,4,size=(10,2)) >>> groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten() >>> groups array([[0, 2], [1, 2], [1, 1], [3, 1], [3, 1], [2, 1], [1, 0], [3, 3], [3, 2], [0, 0]]) >>> groups_ array([(0, 2), (1, 2), (1, 1), (3, 1), (3, 1), (2, 1), (1, 0), (3, 3), (3, 2), (0, 0)], dtype=[('f0', '>> np.argsort(groups_) array([9, 0, 6, 2, 1, 5, 4, 3, 8, 7]) >>> np.argsort(groups_, kind='mergesort') Traceback (most recent call last): File "", line 1, in File "C:\Python26\lib\site-packages\numpy\core\fromnumeric.py", line 679, in argsort return argsort(axis, kind, order) TypeError: requested sort not available for type >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True, return_inverse=True) >>> uni_inv array([1, 4, 3, 6, 6, 5, 2, 8, 7, 0]) exception in numpy 1.6.2rc2 (as reported by Debian for statsmodels) Josef From uri.laserson at gmail.com Mon May 28 23:33:42 2012 From: uri.laserson at gmail.com (Uri Laserson) Date: Mon, 28 May 2012 20:33:42 -0700 Subject: [Numpy-discussion] numpy.random.gamma returns 0 for small shape parameters Message-ID: I am trying to sample from a Dirichlet distribution, where some of the shape parameters are very small. To do so, the algorithm samples each component individually from a Gamma(k,1) distribution where k is the shape parameter for that component of the Dirichlet. In principle, this should always return a positive number (as the Dirichlet is defined). However, if k is very small, it will return zero: In [157]: np.random.gamma(1e-1) Out[157]: 4.863866491339177e-06 In [158]: np.random.gamma(1e-2) Out[158]: 2.424451829710714e-57 In [159]: np.random.gamma(1e-3) Out[159]: 5.1909861689757784e-197 In [160]: np.random.gamma(1e-4) Out[160]: 0.0 In [161]: np.random.gamma(1e-5) Out[161]: 0.0 What is the best way to deal with this? Thanks! Uri ................................................................................... Uri Laserson Graduate Student, Biomedical Engineering Harvard-MIT Division of Health Sciences and Technology M +1 917 742 8019 laserson at mit.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From kalatsky at gmail.com Tue May 29 00:11:53 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Mon, 28 May 2012 23:11:53 -0500 Subject: [Numpy-discussion] numpy.random.gamma returns 0 for small shape parameters In-Reply-To: References: Message-ID: You'll need some patience to get non-zeros, especially for k=1e-5 In [84]: np.sum(np.random.gamma(1e-5,size=1000000)!=0.0) Out[84]: 7259 that's less than 1%. For k=1e-4 it's ~7% Val On Mon, May 28, 2012 at 10:33 PM, Uri Laserson wrote: > I am trying to sample from a Dirichlet distribution, where some of the > shape parameters are very small. To do so, the algorithm samples each > component individually from a Gamma(k,1) distribution where k is the shape > parameter for that component of the Dirichlet. In principle, this should > always return a positive number (as the Dirichlet is defined). However, if > k is very small, it will return zero: > > In [157]: np.random.gamma(1e-1) > Out[157]: 4.863866491339177e-06 > > In [158]: np.random.gamma(1e-2) > Out[158]: 2.424451829710714e-57 > > In [159]: np.random.gamma(1e-3) > Out[159]: 5.1909861689757784e-197 > > In [160]: np.random.gamma(1e-4) > Out[160]: 0.0 > > In [161]: np.random.gamma(1e-5) > Out[161]: 0.0 > > What is the best way to deal with this? > > Thanks! > Uri > > > > > ................................................................................... > Uri Laserson > Graduate Student, Biomedical Engineering > Harvard-MIT Division of Health Sciences and Technology > M +1 917 742 8019 > laserson at mit.edu > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markbak at gmail.com Tue May 29 09:00:01 2012 From: markbak at gmail.com (Mark Bakker) Date: Tue, 29 May 2012 15:00:01 +0200 Subject: [Numpy-discussion] silly isscalar question Message-ID: Why does isscalar('hello') return True? I thought it would check for a number? Numpy 1.6.1 Silly question? -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue May 29 09:27:32 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 29 May 2012 14:27:32 +0100 Subject: [Numpy-discussion] silly isscalar question In-Reply-To: References: Message-ID: On Tue, May 29, 2012 at 2:00 PM, Mark Bakker wrote: > Why does isscalar('hello') return True? > > I thought it would check for a number? > > Numpy 1.6.1 > > Silly question? Nope, but you're thinking of a different sense of "scalar" :-). In numpy, "scalar" means something like "anything that you can have an array of". So numbers are scalars, but so are booleans and strings. If you want to know whether something is a number, I suggest np.issubsctype(type(x), np.number). Though, looking more closely, isscalar returns True for buffer objects, which can only go into arrays that have dtype=object. But isscalar returns False for other user-defined types that would go into a dtype=object array. So perhaps it is a bit confused... -- Nathaniel From derek at astro.physik.uni-goettingen.de Tue May 29 09:31:23 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 29 May 2012 15:31:23 +0200 Subject: [Numpy-discussion] silly isscalar question In-Reply-To: References: Message-ID: On 29 May 2012, at 15:00, Mark Bakker wrote: > Why does isscalar('hello') return True? > > I thought it would check for a number? No, it checks for something that is of 'scalar type', which probably can be translated as 'not equivalent to an array'. Since strings can form numpy arrays, I guess the logic behind this is that the string is the atomic block of an array of dtype 'S' - for comparison, np.isscalar(['hello']) = False. I note the fine distinction between np.isscalar( ('hello') ) and np.isscalar( ('hello'), )... Cheers, Derek From njs at pobox.com Tue May 29 09:42:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 29 May 2012 14:42:41 +0100 Subject: [Numpy-discussion] silly isscalar question In-Reply-To: References: Message-ID: On Tue, May 29, 2012 at 2:31 PM, Derek Homeier wrote: > On 29 May 2012, at 15:00, Mark Bakker wrote: > >> Why does isscalar('hello') return True? >> >> I thought it would check for a number? > > No, it checks for something that is of 'scalar type', which probably can be > translated as 'not equivalent to an array'. Since strings can form numpy arrays, > I guess the logic behind this is that the string is the atomic block of an array > of dtype 'S' - for comparison, np.isscalar(['hello']) = False. > I note the fine distinction between np.isscalar( ('hello') ) and np.isscalar( ('hello'), )... NB you mean np.isscalar( ('hello',) ), which creates a single-element tuple. A trailing comma attached to a value in Python normally creates a tuple, but in a function argument list it is treated as separating arguments instead, and a trailing empty argument is ignored. The parentheses need to be around the comma to hide it from from the argument list parsing rule so that the tuple rule can see it. (Probably you know this, but for anyone reading the archives later...) - N From derek at astro.physik.uni-goettingen.de Tue May 29 10:00:44 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 29 May 2012 16:00:44 +0200 Subject: [Numpy-discussion] silly isscalar question In-Reply-To: References: Message-ID: <4F70D2D7-7CF6-4877-A340-F3D6D2534CEB@astro.physik.uni-goettingen.de> On 29 May 2012, at 15:42, Nathaniel Smith wrote: >> I note the fine distinction between np.isscalar( ('hello') ) and np.isscalar( ('hello'), )... > > NB you mean np.isscalar( ('hello',) ), which creates a single-element > tuple. A trailing comma attached to a value in Python normally creates > a tuple, but in a function argument list it is treated as separating > arguments instead, and a trailing empty argument is ignored. The > parentheses need to be around the comma to hide it from from the > argument list parsing rule so that the tuple rule can see it. > (Probably you know this, but for anyone reading the archives later...) Correct, sorry for the typo! I was actually puzzled by the habit of what seemed to me automatic unpacking of the simple case ('hello') as compared to ('hello', ); I only now looked up that by the Python syntax indeed the comma makes the tuple, not the parentheses, the latter only becoming necessary to protect the comma as you describe above. Just stumbled on this as in several cases, numpy's rules for creating arrays from tuples are slightly different from those for creating arrays from lists. Cheers, Derek From pav at iki.fi Tue May 29 10:45:49 2012 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 29 May 2012 14:45:49 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?numpy=2Erandom=2Egamma_returns_0_for?= =?utf-8?q?_small_shape=09parameters?= References: Message-ID: Val Kalatsky gmail.com> writes: > You'll need some patience to get non-zeros, especially for k=1e-5 > > In [84]: np.sum(np.random.gamma(1e-5,size=1000000)!=0.0) > Out[84]: 7259 > that's less than 1%. For k=1e-4 it's ~7% To clarify: the distribution is peaked at numbers that are too small to be represented as floating-point numbers in the computer. The returned zeros indicate underflow, i.e., some positive numbers between zero and the floating point number closest to zero (~ 1e-308). To work around this, you need to do some math to redefine the problem so that the numbers involved fall into a region where the floating point numbers are dense. -- Pauli Virtanen From massimo.dipierro at gmail.com Tue May 29 10:57:28 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Tue, 29 May 2012 09:57:28 -0500 Subject: [Numpy-discussion] numpy.random.gamma returns 0 for small shape parameters In-Reply-To: References: Message-ID: Another possible solution is to sort the numbers and add them in a binary tree. It reduces the truncation error but makes the problem n- log-n and therefore not worth the trouble. Massimo On May 29, 2012, at 9:45 AM, Pauli Virtanen wrote: > Val Kalatsky gmail.com> writes: >> You'll need some patience to get non-zeros, especially for k=1e-5 >> >> In [84]: np.sum(np.random.gamma(1e-5,size=1000000)!=0.0) >> Out[84]: 7259 >> that's less than 1%. For k=1e-4 it's ~7% > > To clarify: the distribution is peaked at numbers > that are too small to be represented as floating-point > numbers in the computer. The returned zeros indicate > underflow, i.e., some positive numbers between zero > and the floating point number closest to zero (~ 1e-308). > > To work around this, you need to do some math to redefine > the problem so that the numbers involved fall into > a region where the floating point numbers are dense. > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From stefan at sun.ac.za Tue May 29 13:03:04 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 29 May 2012 10:03:04 -0700 Subject: [Numpy-discussion] [enhancement] sum_angle() and sum_polar() In-Reply-To: References: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> Message-ID: On Mon, May 28, 2012 at 11:53 AM, Travis Oliphant wrote: > I could see these functions going into scipy.ndimage but again because they > are not necessarily just image processing functions, and the fact that they > are so simple, perhaps they are best put into NumPy itself. I'm wondering about the general applicability of these functions. Can anyone suggest some use cases? St?fan From jordens at gmail.com Tue May 29 13:40:40 2012 From: jordens at gmail.com (=?UTF-8?Q?Robert_J=C3=B6rdens?=) Date: Tue, 29 May 2012 11:40:40 -0600 Subject: [Numpy-discussion] [enhancement] sum_angle() and sum_polar() In-Reply-To: References: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> Message-ID: On Tue, May 29, 2012 at 11:03 AM, St?fan van der Walt wrote: > On Mon, May 28, 2012 at 11:53 AM, Travis Oliphant wrote: >> I could see these functions going into scipy.ndimage but again because they >> are not necessarily just image processing functions, and the fact that they >> are so simple, perhaps they are best put into NumPy itself. > > I'm wondering about the general applicability of these functions. ?Can > anyone suggest some use cases? An example from solid state physics: If you have a spin chain with some long-range interaction and you have the known, dense, coupling matrix J, sum_angle(J, pi/4) gives you a view at the distance dependence of the interaction. -- Robert Jordens. From charlesr.harris at gmail.com Tue May 29 14:06:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 29 May 2012 12:06:11 -0600 Subject: [Numpy-discussion] [enhancement] sum_angle() and sum_polar() In-Reply-To: References: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> Message-ID: On Tue, May 29, 2012 at 11:40 AM, Robert J?rdens wrote: > On Tue, May 29, 2012 at 11:03 AM, St?fan van der Walt > wrote: > > On Mon, May 28, 2012 at 11:53 AM, Travis Oliphant > wrote: > >> I could see these functions going into scipy.ndimage but again because > they > >> are not necessarily just image processing functions, and the fact that > they > >> are so simple, perhaps they are best put into NumPy itself. > > > > I'm wondering about the general applicability of these functions. Can > > anyone suggest some use cases? > > An example from solid state physics: > If you have a spin chain with some long-range interaction and you have > the known, dense, coupling matrix J, sum_angle(J, pi/4) gives you a > view at the distance dependence of the interaction. > > I'd like to see these functions is scipy somewhere. The function names aren't very descriptive and the one line summaries don't give a very good idea of what they do, so I think those bits could use improvement. Mention of the Hough/Radon transform would help, I had to pull out that connection by reading the code... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 29 14:42:06 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 29 May 2012 12:42:06 -0600 Subject: [Numpy-discussion] 1.6.2 no more unique for rows In-Reply-To: References: Message-ID: On Mon, May 28, 2012 at 9:18 PM, wrote: > > https://github.com/numpy/numpy/commit/74b9f5eef8fac643bf9012dbb2ac6b4b19f46892 > broke return_inverse for structured arrays, because of the use of mergesort > > I'm using structured dtypes to get uniques and return_inverse by rows > > >>> groups = np.random.randint(0,4,size=(10,2)) > >>> groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten() > >>> groups > array([[0, 2], > [1, 2], > [1, 1], > [3, 1], > [3, 1], > [2, 1], > [1, 0], > [3, 3], > [3, 2], > [0, 0]]) > >>> groups_ > array([(0, 2), (1, 2), (1, 1), (3, 1), (3, 1), (2, 1), (1, 0), (3, 3), > (3, 2), (0, 0)], > dtype=[('f0', ' > >>> np.argsort(groups_) > array([9, 0, 6, 2, 1, 5, 4, 3, 8, 7]) > > >>> np.argsort(groups_, kind='mergesort') > Traceback (most recent call last): > File "", line 1, in > File "C:\Python26\lib\site-packages\numpy\core\fromnumeric.py", line > 679, in argsort > return argsort(axis, kind, order) > TypeError: requested sort not available for type > > >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True, > return_inverse=True) > >>> uni_inv > array([1, 4, 3, 6, 6, 5, 2, 8, 7, 0]) > > exception in numpy 1.6.2rc2 (as reported by Debian for statsmodels) > > I've been putting of, um, planning to implement the different sort kinds for object/structured arrays for a while, sounds like it needs to get done. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jerome.Kieffer at esrf.fr Tue May 29 16:36:28 2012 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Tue, 29 May 2012 22:36:28 +0200 Subject: [Numpy-discussion] [enhancement] sum_angle() and sum_polar() In-Reply-To: References: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> Message-ID: <20120529223628.4b4462af.Jerome.Kieffer@esrf.fr> On Tue, 29 May 2012 10:03:04 -0700 St?fan van der Walt wrote: > On Mon, May 28, 2012 at 11:53 AM, Travis Oliphant wrote: > > I could see these functions going into scipy.ndimage but again because they > > are not necessarily just image processing functions, and the fact that they > > are so simple, perhaps they are best put into NumPy itself. > > I'm wondering about the general applicability of these functions. Can > anyone suggest some use cases? I wrote a whole library about that ... pyFAI (available in debian) https://forge.epn-campus.eu/attachments/1459/20111010-PyFAI-Poster-A0.pdf Unfortunately real detector are never completely orthogonal to the incident beam, pixels are never square, ... what makes things more complicated. Cheers, -- J?r?me Kieffer Data analysis unit - ESRF From jordens at gmail.com Tue May 29 19:16:23 2012 From: jordens at gmail.com (=?UTF-8?Q?Robert_J=C3=B6rdens?=) Date: Tue, 29 May 2012 17:16:23 -0600 Subject: [Numpy-discussion] [enhancement] sum_angle() and sum_polar() In-Reply-To: References: <443DC382-4C57-4EB1-A4C2-01BC5778A8CE@continuum.io> Message-ID: On Tue, May 29, 2012 at 12:06 PM, Charles R Harris wrote: > I'd like to see these functions is scipy somewhere. The function names > aren't very descriptive and the one line summaries don't give a very good > idea of what they do, so I think those bits could use improvement. Mention > of the Hough/Radon transform would help, I had to pull out that connection > by reading the code... I'll fix the descriptions. What more descriptive names did you have in mind? -- Robert Jordens. From njs at pobox.com Wed May 30 06:59:35 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 30 May 2012 11:59:35 +0100 Subject: [Numpy-discussion] 1.6.2 no more unique for rows In-Reply-To: References: Message-ID: On Tue, May 29, 2012 at 7:42 PM, Charles R Harris wrote: > > > On Mon, May 28, 2012 at 9:18 PM, wrote: >> >> >> https://github.com/numpy/numpy/commit/74b9f5eef8fac643bf9012dbb2ac6b4b19f46892 >> broke return_inverse for structured arrays, because of the use of >> mergesort >> >> I'm using structured dtypes to get uniques and return_inverse by rows >> >> >>> groups = np.random.randint(0,4,size=(10,2)) >> >>> groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten() >> >>> groups >> array([[0, 2], >> ? ? ? [1, 2], >> ? ? ? [1, 1], >> ? ? ? [3, 1], >> ? ? ? [3, 1], >> ? ? ? [2, 1], >> ? ? ? [1, 0], >> ? ? ? [3, 3], >> ? ? ? [3, 2], >> ? ? ? [0, 0]]) >> >>> groups_ >> array([(0, 2), (1, 2), (1, 1), (3, 1), (3, 1), (2, 1), (1, 0), (3, 3), >> ? ? ? (3, 2), (0, 0)], >> ? ? ?dtype=[('f0', '> >> >>> np.argsort(groups_) >> array([9, 0, 6, 2, 1, 5, 4, 3, 8, 7]) >> >> >>> np.argsort(groups_, kind='mergesort') >> Traceback (most recent call last): >> ?File "", line 1, in >> ?File "C:\Python26\lib\site-packages\numpy\core\fromnumeric.py", line >> 679, in argsort >> ? ?return argsort(axis, kind, order) >> TypeError: requested sort not available for type >> >> >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True, >> >>> return_inverse=True) >> >>> uni_inv >> array([1, 4, 3, 6, 6, 5, 2, 8, 7, 0]) >> >> exception in numpy 1.6.2rc2 (as reported by Debian for statsmodels) >> > > I've been putting of, um, planning to implement the different sort kinds for > object/structured arrays for a while, sounds like it needs to get done. So I guess this is a 1.6.1 -> 1.6.2 regression, and presumably we won't be landing any new sort implementations in the 1.6 branch. Should we be thinking about reverting this and releasing a 1.6.3? (I don't know if it's worth it, but it seems like something we should think about either way.) Same question applies to 1.7 too -- obviously the change to unique() is a good one, but maybe it has to wait until mergesort can handle structured dtypes? -N From Marc.Poinot at onera.fr Wed May 30 08:44:09 2012 From: Marc.Poinot at onera.fr (Marc Poinot) Date: Wed, 30 May 2012 14:44:09 +0200 Subject: [Numpy-discussion] f2py hiding callback function Message-ID: <4FC61619.1010406@onera.fr> Hi all, I'm trying to hide the actual python callback function on the fortran side as well as on the python side. See the example: I want f1 to be the wrapper of my callback, f2 is a second level where the user has no arg to pass. But if I call f2, for example in f3, the callback is 'propagated' and I have to declare it the same way as in f2. Is there a trick to stop this propagation? subroutine f1(p,t) cf2py intent(callback) push_back_path cf2py optional push_back_path cf2py external push_back_path character(len=*) p character(len=*) t call push_back_path(p,t) return end subroutine f2() cf2py intent(callback) push_back_path cf2py optional push_back_path cf2py external push_back_path cf2py use f1__user__routines character(len=96) p character(len=32) t p='/usr' t='local' call f1(p,t) t='lib' call f1(p,t) return end subroutine f3() call f2() end The python test: # ----------------------------------------------- # f2py -c pathutils.for -m pathutils # python pathtest.py import pathutils import string def push_back_path(p,t): p=p.strip() t=t.strip() p=string.ljust(p+'/'+t,96) print 'PUSH_BACK_PATH [%s]'%p pathutils.push_back_path=push_back_path print 'F1' pathutils.f1('/usr','bin') print 'F2' # Hiding on python side is ok pathutils.f2() # Hiding on fortran side fails print 'F3' pathutils.f3() Which leads to: $ python pathtest.py F1 PUSH_BACK_PATH [/usr/bin ] F2 PUSH_BACK_PATH [/usr/local ] PUSH_BACK_PATH [/usr/lib ] F3 capi_return is NULL Call-back cb_push_back_path_in_f1__user__routines failed. capi_return is NULL Call-back cb_push_back_path_in_f1__user__routines failed. Traceback (most recent call last): File "pathtest.py", line 20, in pathutils.f3() TypeError: push_back_path() takes exactly 2 arguments (0 given) TIA -MP- ----------------------------------------------------------------------- Marc POINOT [ONERA/DSNA] Tel:+33.1.46.73.42.84 Fax:+33.1.46.73.41.66 Avertissement/disclaimer http://www.onera.fr/onera-en/emails-terms From heng at cantab.net Wed May 30 11:13:36 2012 From: heng at cantab.net (Henry Gomersall) Date: Wed, 30 May 2012 16:13:36 +0100 Subject: [Numpy-discussion] Numpy code in GPL package Message-ID: <1338390816.2371.16.camel@farnsworth> I'd like to include the _cook_nd_args() function from fftpack in my GPL code. Is this possible? How should I modify my license file to satisfy the Numpy license requirements, but so it's clear which function it applies to? Thanks, Henry From charlesr.harris at gmail.com Wed May 30 11:39:22 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 30 May 2012 09:39:22 -0600 Subject: [Numpy-discussion] 1.6.2 no more unique for rows In-Reply-To: References: Message-ID: On Wed, May 30, 2012 at 4:59 AM, Nathaniel Smith wrote: > On Tue, May 29, 2012 at 7:42 PM, Charles R Harris > wrote: > > > > > > On Mon, May 28, 2012 at 9:18 PM, wrote: > >> > >> > >> > https://github.com/numpy/numpy/commit/74b9f5eef8fac643bf9012dbb2ac6b4b19f46892 > >> broke return_inverse for structured arrays, because of the use of > >> mergesort > >> > >> I'm using structured dtypes to get uniques and return_inverse by rows > >> > >> >>> groups = np.random.randint(0,4,size=(10,2)) > >> >>> groups_ = groups.view([('',groups.dtype)]*groups.shape[1]).flatten() > >> >>> groups > >> array([[0, 2], > >> [1, 2], > >> [1, 1], > >> [3, 1], > >> [3, 1], > >> [2, 1], > >> [1, 0], > >> [3, 3], > >> [3, 2], > >> [0, 0]]) > >> >>> groups_ > >> array([(0, 2), (1, 2), (1, 1), (3, 1), (3, 1), (2, 1), (1, 0), (3, 3), > >> (3, 2), (0, 0)], > >> dtype=[('f0', ' >> > >> >>> np.argsort(groups_) > >> array([9, 0, 6, 2, 1, 5, 4, 3, 8, 7]) > >> > >> >>> np.argsort(groups_, kind='mergesort') > >> Traceback (most recent call last): > >> File "", line 1, in > >> File "C:\Python26\lib\site-packages\numpy\core\fromnumeric.py", line > >> 679, in argsort > >> return argsort(axis, kind, order) > >> TypeError: requested sort not available for type > >> > >> >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True, > >> >>> return_inverse=True) > >> >>> uni_inv > >> array([1, 4, 3, 6, 6, 5, 2, 8, 7, 0]) > >> > >> exception in numpy 1.6.2rc2 (as reported by Debian for statsmodels) > >> > > > > I've been putting of, um, planning to implement the different sort kinds > for > > object/structured arrays for a while, sounds like it needs to get done. > > So I guess this is a 1.6.1 -> 1.6.2 regression, and presumably we > won't be landing any new sort implementations in the 1.6 branch. > Should we be thinking about reverting this and releasing a 1.6.3? (I > don't know if it's worth it, but it seems like something we should > think about either way.) > > Same question applies to 1.7 too -- obviously the change to unique() > is a good one, but maybe it has to wait until mergesort can handle > structured dtypes? > > Should definitely be reverted if a 1.6.3 goes out. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed May 30 12:11:05 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 30 May 2012 17:11:05 +0100 Subject: [Numpy-discussion] Numpy code in GPL package In-Reply-To: <1338390816.2371.16.camel@farnsworth> References: <1338390816.2371.16.camel@farnsworth> Message-ID: On Wed, May 30, 2012 at 4:13 PM, Henry Gomersall wrote: > I'd like to include the _cook_nd_args() function from fftpack in my GPL > code. Is this possible? Yes. The numpy license is compatible with the GPL license, so code from numpy may be incorporated into GPL programs. > How should I modify my license file to satisfy the Numpy license > requirements, but so it's clear which function it applies to? The clearest thing to do is to keep the _cook_nd_args() function implemented in its own file which has the numpy license text placed in a comment at the top. You should have a LICENSE.txt file that says that your program is GPLed (and point to the COPYING file that contains the GPL text) and also mentions that the _cook_nd_args file (which you should mention by name) has the numpy license. Since the numpy license is relatively small, you can just copy the full numpy license into your LICENSE.txt file. Something like this: Copyright (C) 2012 Henry Gomersall This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 3rd Party Addendum ================== The file cook_nd_args.c is derived from NumPy, and is available under the Modified BSD License: Copyright (c) 2005-2011, NumPy Developers. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -- Robert Kern From heng at cantab.net Wed May 30 12:15:52 2012 From: heng at cantab.net (Henry Gomersall) Date: Wed, 30 May 2012 17:15:52 +0100 Subject: [Numpy-discussion] Numpy code in GPL package In-Reply-To: References: <1338390816.2371.16.camel@farnsworth> Message-ID: <1338394552.2371.21.camel@farnsworth> On Wed, 2012-05-30 at 17:11 +0100, Robert Kern wrote: > On Wed, May 30, 2012 at 4:13 PM, Henry Gomersall > wrote: > > I'd like to include the _cook_nd_args() function from fftpack in my > GPL > > code. Is this possible? > > Yes. The numpy license is compatible with the GPL license, so code > from numpy may be incorporated into GPL programs. > > > How should I modify my license file to satisfy the Numpy license > > requirements, but so it's clear which function it applies to? > > The clearest thing to do is to keep the _cook_nd_args() function > implemented in its own file which has the numpy license text placed in > a comment at the top. > That's very helpful, thanks! Henry From ralf.gommers at googlemail.com Wed May 30 17:55:12 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 30 May 2012 23:55:12 +0200 Subject: [Numpy-discussion] 1.6.2 no more unique for rows In-Reply-To: References: Message-ID: On Wed, May 30, 2012 at 5:39 PM, Charles R Harris wrote: > > > On Wed, May 30, 2012 at 4:59 AM, Nathaniel Smith wrote: > >> On Tue, May 29, 2012 at 7:42 PM, Charles R Harris >> wrote: >> > >> > >> > On Mon, May 28, 2012 at 9:18 PM, wrote: >> >> >> >> >> >> >> https://github.com/numpy/numpy/commit/74b9f5eef8fac643bf9012dbb2ac6b4b19f46892 >> >> broke return_inverse for structured arrays, because of the use of >> >> mergesort >> >> >> >> I'm using structured dtypes to get uniques and return_inverse by rows >> >> >> >> >>> groups = np.random.randint(0,4,size=(10,2)) >> >> >>> groups_ = >> groups.view([('',groups.dtype)]*groups.shape[1]).flatten() >> >> >>> groups >> >> array([[0, 2], >> >> [1, 2], >> >> [1, 1], >> >> [3, 1], >> >> [3, 1], >> >> [2, 1], >> >> [1, 0], >> >> [3, 3], >> >> [3, 2], >> >> [0, 0]]) >> >> >>> groups_ >> >> array([(0, 2), (1, 2), (1, 1), (3, 1), (3, 1), (2, 1), (1, 0), (3, 3), >> >> (3, 2), (0, 0)], >> >> dtype=[('f0', '> >> >> >> >>> np.argsort(groups_) >> >> array([9, 0, 6, 2, 1, 5, 4, 3, 8, 7]) >> >> >> >> >>> np.argsort(groups_, kind='mergesort') >> >> Traceback (most recent call last): >> >> File "", line 1, in >> >> File "C:\Python26\lib\site-packages\numpy\core\fromnumeric.py", line >> >> 679, in argsort >> >> return argsort(axis, kind, order) >> >> TypeError: requested sort not available for type >> >> >> >> >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True, >> >> >>> return_inverse=True) >> >> >>> uni_inv >> >> array([1, 4, 3, 6, 6, 5, 2, 8, 7, 0]) >> >> >> >> exception in numpy 1.6.2rc2 (as reported by Debian for statsmodels) >> >> >> > >> > I've been putting of, um, planning to implement the different sort >> kinds for >> > object/structured arrays for a while, sounds like it needs to get done. >> >> So I guess this is a 1.6.1 -> 1.6.2 regression, and presumably we >> won't be landing any new sort implementations in the 1.6 branch. >> Should we be thinking about reverting this and releasing a 1.6.3? (I >> don't know if it's worth it, but it seems like something we should >> think about either way.) >> >> Same question applies to 1.7 too -- obviously the change to unique() >> is a good one, but maybe it has to wait until mergesort can handle >> structured dtypes? >> >> > Should definitely be reverted if a 1.6.3 goes out. > But is a 1.6.3 required for this issue alone? It's a regression, but it looks like a corner case and is already fixed in statsmodels. If there are more users who are running into this problem though, I'm OK with doing a 1.6.3 release just for this. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed May 30 19:08:16 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 30 May 2012 19:08:16 -0400 Subject: [Numpy-discussion] 1.6.2 no more unique for rows In-Reply-To: References: Message-ID: On Wed, May 30, 2012 at 5:55 PM, Ralf Gommers wrote: > > > On Wed, May 30, 2012 at 5:39 PM, Charles R Harris > wrote: >> >> >> >> On Wed, May 30, 2012 at 4:59 AM, Nathaniel Smith wrote: >>> >>> On Tue, May 29, 2012 at 7:42 PM, Charles R Harris >>> wrote: >>> > >>> > >>> > On Mon, May 28, 2012 at 9:18 PM, wrote: >>> >> >>> >> >>> >> >>> >> https://github.com/numpy/numpy/commit/74b9f5eef8fac643bf9012dbb2ac6b4b19f46892 >>> >> broke return_inverse for structured arrays, because of the use of >>> >> mergesort >>> >> >>> >> I'm using structured dtypes to get uniques and return_inverse by rows >>> >> >>> >> >>> groups = np.random.randint(0,4,size=(10,2)) >>> >> >>> groups_ = >>> >> >>> groups.view([('',groups.dtype)]*groups.shape[1]).flatten() >>> >> >>> groups >>> >> array([[0, 2], >>> >> ? ? ? [1, 2], >>> >> ? ? ? [1, 1], >>> >> ? ? ? [3, 1], >>> >> ? ? ? [3, 1], >>> >> ? ? ? [2, 1], >>> >> ? ? ? [1, 0], >>> >> ? ? ? [3, 3], >>> >> ? ? ? [3, 2], >>> >> ? ? ? [0, 0]]) >>> >> >>> groups_ >>> >> array([(0, 2), (1, 2), (1, 1), (3, 1), (3, 1), (2, 1), (1, 0), (3, 3), >>> >> ? ? ? (3, 2), (0, 0)], >>> >> ? ? ?dtype=[('f0', '>> >> >>> >> >>> np.argsort(groups_) >>> >> array([9, 0, 6, 2, 1, 5, 4, 3, 8, 7]) >>> >> >>> >> >>> np.argsort(groups_, kind='mergesort') >>> >> Traceback (most recent call last): >>> >> ?File "", line 1, in >>> >> ?File "C:\Python26\lib\site-packages\numpy\core\fromnumeric.py", line >>> >> 679, in argsort >>> >> ? ?return argsort(axis, kind, order) >>> >> TypeError: requested sort not available for type >>> >> >>> >> >>> uni, uni_idx, uni_inv = np.unique(groups_, return_index=True, >>> >> >>> return_inverse=True) >>> >> >>> uni_inv >>> >> array([1, 4, 3, 6, 6, 5, 2, 8, 7, 0]) >>> >> >>> >> exception in numpy 1.6.2rc2 (as reported by Debian for statsmodels) >>> >> >>> > >>> > I've been putting of, um, planning to implement the different sort >>> > kinds for >>> > object/structured arrays for a while, sounds like it needs to get done. >>> >>> So I guess this is a 1.6.1 -> 1.6.2 regression, and presumably we >>> won't be landing any new sort implementations in the 1.6 branch. >>> Should we be thinking about reverting this and releasing a 1.6.3? (I >>> don't know if it's worth it, but it seems like something we should >>> think about either way.) >>> >>> Same question applies to 1.7 too -- obviously the change to unique() >>> is a good one, but maybe it has to wait until mergesort can handle >>> structured dtypes? >>> >> >> Should definitely be reverted if a 1.6.3 goes out. > > > But is a 1.6.3 required for this issue alone? It's a regression, but it > looks like a corner case and is already fixed in statsmodels. If there are > more users who are running into this problem though, I'm OK with doing a > 1.6.3 release just for this. For statsmodels it doesn't make much difference anymore when this gets changed. Once it is released in a numpy version that we support, we are pretty much stuck with numpy compatibility files. Fortunately the function was easy to copy. Unfortunately we didn't have test coverage or didn't test this before numpy 1.6.2. Josef > > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From chris.barker at noaa.gov Wed May 30 19:44:28 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 30 May 2012 16:44:28 -0700 Subject: [Numpy-discussion] Scientific Software and Web Developer Needed in Seattle, WA Message-ID: Scientific Software and Web Developer Needed NOAA Emergency Response Division Help us develop our next-generation oil spill transport model. Background: The Emergency Response Division (ERD) of NOAA's Office of Response and Restoration (OR&R) provides scientific expertise to support the response to oil and chemical spills in the coastal environment. We played a major role in the recent Deepwater Horizon oil spill in the Gulf of Mexico. In order to fulfill our mission, we develop many of the software tools and models required to support a response to hazardous material spills. We are currently in the middle of a program to develop our next-generation oil spill transport model, taking into account lessons learned from years of response and recent major incidents. There are currently two positions available: one will focus on on the computational code in C++, Pyhton and Cython, and the other on a new web front end (Python, HTML, CSS, Javascript). General Characteristics: The incumbents in this position will provide software development services to support the mission of the Emergency Response Division of NOAA's Office of Response and Restoration. As part of his/her efforts, independent evaluation and application of development techniques, algorithms, software architecture, and programming patterns will be required. The incumbent will work with the staff of ERD to provide analysis on user needs and software, GUI, and library design. He/she will be expect to work primarily on site at NOAA's facility in Seattle. Knowledge: The incumbent must be able to apply modern concepts of software engineering and design to the development of computational code, web applications, and libraries. The incumbent will need to be able to design, write, refactor, and implement code for a complex web application and/or computational library. The incumbent will work with a multi-disciplinary team including scientists, users, and other developers, utilizing software development practices such as usability design, version control, bug and issue tracking, and unit testing. Good communication skills and the knowledge of working as part of a team are required. Direction received: The incumbent will participate on various research and development teams. While endpoints will be identified through Division management and some direct supervision will be provided, the incumbent will be responsible for progressively being able to take input from team meetings and design objectives and propose strategies for reaching endpoints. Typical duties and responsibilities: The incumbent will work with the oil and chemical spill modeling team to improve and develop new tools and models used in fate and transport forecasting. Different components of the project will be written in C++, Python, and Javascript. Education requirement, minimum: Bachelor's degree in a technical discipline. Experience requirement, minimum: One to five years experience in development of complex software systems in one or more full-featured programming languages (C, C++, Java, Python, Javaascript, Ruby, Fortran, etc.) The team requires experience in the following languages/disciplines. Each incumbent will need experience in some subset: * Computational/Scientific programming * Numerical Analysis/Methods * Parallel processing * Desktop GUI * Web services * Web clients: HTML/CSS/Javascript * Python * C/C++ * Python--C/C++ integration * Software development team leadership While the incumbent will work on-site at NOAA, directly with the NOAA team, this is a contract position with General Dynamics Information Technology: http://www.gdit.com/default.aspx For more information and to apply, use the GDIT web site: http://www.resumeware.net/gdns_rw/gdns_web/job_detail.cfm?recnum=1&totalrecs=1&start=1&pagestart=1 If that long url doesn't work, try: http://www.resumeware.net/gdns_rw/gdns_web/job_search.cfm and search for job ID: 199765 You can also send questions about employment issues to: Susan Bowley: susan.bowley at gdit.com And questions about the nature of the work to: Chris Barker: Chris.Barker at noaa.gov -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From farrowch at gmail.com Wed May 30 22:05:30 2012 From: farrowch at gmail.com (chris farrow) Date: Wed, 30 May 2012 21:05:30 -0500 Subject: [Numpy-discussion] Bus error when using flat on sliced, memmap'd array Message-ID: Hi all, I encountered an odd bug today that I wanted to bring to everyone's attention. First the code: >>> import numpy as np >>> shape = (8, 8) >>> dtype = np.dtype(np.uint8) >>> image = np.random.randint(0, 256, shape).astype(dtype) >>> image.tofile("test_image.bin") >>> image = np.memmap("test_image.bin", dtype=dtype, shape=shape, mode='r') >>> arr = image[::2,::2] >>> np.sum(arr.flat) On my system (numpy 1.6.1, git revision 68538b74483009c2c2d1644ef00397014f95a696, on OSX, python 2.7.3 (32-bit)), this causes a bus error when run. Here's what I've discovered so far about this: - the bus error only occurs with mode 'r' - the dimensionality of the array appears to be irrelevant - if the array slice does not change the strides, the bus error does not occur - no 'arr.flat', no bus error - Other aggregating functions (e.g. fmin.reduce) will induce the error - Iterating over arr.flat will *not* cause a bus error Based on this, I suspect the issue is with the C-facing side of the flat iterator. Enjoy! Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From kalatsky at gmail.com Wed May 30 22:49:53 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Wed, 30 May 2012 21:49:53 -0500 Subject: [Numpy-discussion] Bus error when using flat on sliced, memmap'd array In-Reply-To: References: Message-ID: Confirmed on Ubuntu, np.__version__ 1.5.1 and 1.6.1 (backtraces are bellow). Something seems to be broken before it comes to memcpy and/or _aligned_contig_to_strided_size1. Val ------------------------------------------------------------- np.__version__ 1.6.1 Program received signal SIGSEGV, Segmentation fault. 0x00007ffff5b63cdf in _aligned_contig_to_strided_size1 () from /home/vkalatsky/src/epd8/lib/python2.7/site-packages/numpy/core/multiarray.so (gdb) bt #0 0x00007ffff5b63cdf in _aligned_contig_to_strided_size1 () from /home/vkalatsky/src/epd8/lib/python2.7/site-packages/numpy/core/multiarray.so #1 0x00007ffff5bc2c31 in PyArray_CopyAnyIntoOrdered () from /home/vkalatsky/src/epd8/lib/python2.7/site-packages/numpy/core/multiarray.so #2 0x00007ffff5bc3329 in array_dealloc () from /home/vkalatsky/src/epd8/lib/python2.7/site-packages/numpy/core/multiarray.so #3 0x00007ffff71b7bbb in meth_dealloc (m=0xb6fdd0) at Objects/methodobject.c:134 #4 0x00007ffff720df99 in PyEval_EvalFrameEx (f=0xc3f310, throwflag=) at Python/ceval.c:2712 #5 0x00007ffff7214722 in PyEval_EvalCodeEx (co=0x791d30, globals=, locals=, args=0xab7fd0, argcount=5, kws=0xab7ff8, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253 #6 0x00007ffff72126b6 in call_function (f=0xab7e20, throwflag=) at Python/ceval.c:4109 #7 PyEval_EvalFrameEx (f=0xab7e20, throwflag=) at Python/ceval.c:2666 #8 0x00007ffff7214722 in PyEval_EvalCodeEx (co=0x7988b0, globals=, locals=, args=0x3, argcount=1, kws=0x3, kwcount=0, defs=0x79c518, defcount=3, closure=0x0) at Python/ceval.c:3253 #9 0x00007ffff72126b6 in call_function (f=0x6d0820, throwflag=) at Python/ceval.c:4109 #10 PyEval_EvalFrameEx (f=0x6d0820, throwflag=) at Python/ceval.c:2666 #11 0x00007ffff7214722 in PyEval_EvalCodeEx (co=0x7ffff7eb41b0, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253 #12 0x00007ffff7214772 in PyEval_EvalCode (co=0x7ffff7ff5000, globals=0x2, locals=0x897cb1) at Python/ceval.c:667 #13 0x00007ffff722e432 in run_mod (mod=, filename=, globals=0x640140, locals=0x640140, flags=, arena=) at Python/pythonrun.c:1346 #14 0x00007ffff722e506 in PyRun_FileExFlags (fp=0x6cb390, filename=0x7fffffffe599 "cf_test.py", start=257, globals=0x640140, locals=0x640140, closeit=1, flags=0x7fffffffe1a0) at Python/pythonrun.c:1332 #15 0x00007ffff722fa67 in PyRun_SimpleFileExFlags (fp=, filename=0x7fffffffe599 "cf_test.py", closeit=1, flags=0x7fffffffe1a0) at Python/pythonrun.c:936 #16 0x00007ffff72401e2 in Py_Main (argc=1, argv=0x7fffffffe2c8) at Modules/main.c:689 #17 0x00007ffff652cd8e in __libc_start_main (main=, argc=, ubp_av=, init=, fini=, rtld_fini=, stack_end=0x7fffffffe2b8) at libc-start.c:226 #18 0x00000000004006f9 in _start () ----------------------------------------------------------- np.__version__ 1.5.1 Program received signal SIGSEGV, Segmentation fault. memcpy () at ../sysdeps/x86_64/memcpy.S:67 67 ../sysdeps/x86_64/memcpy.S: No such file or directory. in ../sysdeps/x86_64/memcpy.S (gdb) bt #0 memcpy () at ../sysdeps/x86_64/memcpy.S:67 #1 0x00007ffff5bf2460 in PyArray_CopyAnyInto () from /home/vkalatsky/epd70/lib/python2.7/site-packages/numpy/core/multiarray.so #2 0x00007ffff5bf2a69 in array_dealloc () from /home/vkalatsky/epd70/lib/python2.7/site-packages/numpy/core/multiarray.so #3 0x00007ffff71b889b in meth_dealloc (m=0xbdd170) at Objects/methodobject.c:134 #4 0x00007ffff720e8b9 in PyEval_EvalFrameEx (f=0xc12450, throwflag=) at Python/ceval.c:2711 #5 0x00007ffff7215042 in PyEval_EvalCodeEx (co=0x81e630, globals=, locals=, args=0x7bd820, argcount=5, kws=0x7bd848, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252 #6 0x00007ffff7212fd6 in call_function (f=0x7bd670, throwflag=) at Python/ceval.c:4108 #7 PyEval_EvalFrameEx (f=0x7bd670, throwflag=) at Python/ceval.c:2665 #8 0x00007ffff7215042 in PyEval_EvalCodeEx (co=0x85f1b0, globals=, locals=, args=0x3, argcount=1, kws=0x3, kwcount=0, defs=0x85e928, defcount=3, closure=0x0) at Python/ceval.c:3252 #9 0x00007ffff7212fd6 in call_function (f=0x747ea0, throwflag=) at Python/ceval.c:4108 #10 PyEval_EvalFrameEx (f=0x747ea0, throwflag=) at Python/ceval.c:2665 #11 0x00007ffff7215042 in PyEval_EvalCodeEx (co=0x691b30, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252 #12 0x00007ffff7215092 in PyEval_EvalCode (co=0x7ffff7ff3000, globals=0xb55550, locals=0x1) at Python/ceval.c:666 #13 0x00007ffff722eb82 in run_mod (mod=, filename=, globals=0x640140, locals=0x640140, flags=, arena=) at Python/pythonrun.c:1346 #14 0x00007ffff722ec56 in PyRun_FileExFlags (fp=0x743250, filename=0x7fffffffe5bd "cf_test.py", start=257, globals=0x640140, locals=0x640140, closeit=1, flags=0x7fffffffe1c0) at Python/pythonrun.c:1332 #15 0x00007ffff72301b7 in PyRun_SimpleFileExFlags (fp=, filename=0x7fffffffe5bd "cf_test.py", closeit=1, flags=0x7fffffffe1c0) at Python/pythonrun.c:936 #16 0x00007ffff724083a in Py_Main (argc=1, argv=0x7fffffffe2e8) at Modules/main.c:676 #17 0x00007ffff652dd8e in __libc_start_main (main=, argc=, ubp_av=, init=, fini=, rtld_fini=, stack_end=0x7fffffffe2d8) at libc-start.c:226 #18 0x00000000004006f9 in _start () On Wed, May 30, 2012 at 9:05 PM, chris farrow wrote: > Hi all, > > I encountered an odd bug today that I wanted to bring to everyone's > attention. First the code: > > >>> import numpy as np > >>> shape = (8, 8) > >>> dtype = np.dtype(np.uint8) > >>> image = np.random.randint(0, 256, shape).astype(dtype) > >>> image.tofile("test_image.bin") > >>> image = np.memmap("test_image.bin", dtype=dtype, shape=shape, mode='r') > >>> arr = image[::2,::2] > >>> np.sum(arr.flat) > > On my system (numpy 1.6.1, git > revision 68538b74483009c2c2d1644ef00397014f95a696, on OSX, python 2.7.3 > (32-bit)), this causes a bus error when run. > > Here's what I've discovered so far about this: > - the bus error only occurs with mode 'r' > - the dimensionality of the array appears to be irrelevant > - if the array slice does not change the strides, the bus error does not > occur > - no 'arr.flat', no bus error > - Other aggregating functions (e.g. fmin.reduce) will induce the error > - Iterating over arr.flat will *not* cause a bus error > > Based on this, I suspect the issue is with the C-facing side of the flat > iterator. > > Enjoy! > > Chris > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkerzendorf at gmail.com Thu May 31 01:36:07 2012 From: wkerzendorf at gmail.com (Wolfgang Kerzendorf) Date: Thu, 31 May 2012 01:36:07 -0400 Subject: [Numpy-discussion] fast access and normalizing of ndarray slices Message-ID: <5B532960-20F1-41FA-A17D-10CC5BEB898F@gmail.com> Dear all, I have an ndarray which consists of many arrays stacked behind each other (only conceptually, in truth it's a normal 1d float64 array). I have a second array which tells me the start of the individual data sets in the 1d float64 array and another one which tells me the length. Example: data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality [1,2,1,2,3,4,1,2,3, dtype=float64] start_pointer = [0, 2, 6] length_data = [2, 4, 3] I now want to normalize each of the individual data sets. I wrote a simple for loop over the start_pointer and length data grabbed the data and normalized it and wrote it back to the big array. That's slow. Is there an elegant numpy way to do that? Do I have to go the cython way? Cheers Wolfgang From kalatsky at gmail.com Thu May 31 01:43:18 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Thu, 31 May 2012 00:43:18 -0500 Subject: [Numpy-discussion] fast access and normalizing of ndarray slices In-Reply-To: <5B532960-20F1-41FA-A17D-10CC5BEB898F@gmail.com> References: <5B532960-20F1-41FA-A17D-10CC5BEB898F@gmail.com> Message-ID: What do you mean by "normalized it"? Could you give the output of your procedure for the sample input data. Val On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf wrote: > Dear all, > > I have an ndarray which consists of many arrays stacked behind each other > (only conceptually, in truth it's a normal 1d float64 array). > I have a second array which tells me the start of the individual data sets > in the 1d float64 array and another one which tells me the length. > Example: > > data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality > [1,2,1,2,3,4,1,2,3, dtype=float64] > start_pointer = [0, 2, 6] > length_data = [2, 4, 3] > > I now want to normalize each of the individual data sets. I wrote a simple > for loop over the start_pointer and length data grabbed the data and > normalized it and wrote it back to the big array. That's slow. Is there an > elegant numpy way to do that? Do I have to go the cython way? > > Cheers > Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu May 31 07:59:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 31 May 2012 06:59:53 -0500 Subject: [Numpy-discussion] Bus error when using flat on sliced, memmap'd array In-Reply-To: References: Message-ID: <93C602D4-6775-4DEC-AA73-7424818AE733@continuum.io> Be sure to file a ticket... -Travis On May 30, 2012, at 9:05 PM, chris farrow wrote: > Hi all, > > I encountered an odd bug today that I wanted to bring to everyone's attention. First the code: > > >>> import numpy as np > >>> shape = (8, 8) > >>> dtype = np.dtype(np.uint8) > >>> image = np.random.randint(0, 256, shape).astype(dtype) > >>> image.tofile("test_image.bin") > >>> image = np.memmap("test_image.bin", dtype=dtype, shape=shape, mode='r') > >>> arr = image[::2,::2] > >>> np.sum(arr.flat) > > On my system (numpy 1.6.1, git revision 68538b74483009c2c2d1644ef00397014f95a696, on OSX, python 2.7.3 (32-bit)), this causes a bus error when run. > > Here's what I've discovered so far about this: > - the bus error only occurs with mode 'r' > - the dimensionality of the array appears to be irrelevant > - if the array slice does not change the strides, the bus error does not occur > - no 'arr.flat', no bus error > - Other aggregating functions (e.g. fmin.reduce) will induce the error > - Iterating over arr.flat will *not* cause a bus error > > Based on this, I suspect the issue is with the C-facing side of the flat iterator. > > Enjoy! > > Chris > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkerzendorf at gmail.com Thu May 31 09:27:16 2012 From: wkerzendorf at gmail.com (Wolfgang Kerzendorf) Date: Thu, 31 May 2012 09:27:16 -0400 Subject: [Numpy-discussion] fast access and normalizing of ndarray slices In-Reply-To: References: <5B532960-20F1-41FA-A17D-10CC5BEB898F@gmail.com> Message-ID: <63E36CAE-E232-4198-BA49-C70837E5FEB1@gmail.com> Hey Val, Well it doesn't matter what I do, but specifically I do factor = sum(data_array[start_point:start_point+length_data]) and then data[array[start_point:start_point+length_data]) /= factor. and that for every star_point and length data. How to do this fast? Cheers Wolfgang On 2012-05-31, at 1:43 AM, Val Kalatsky wrote: > What do you mean by "normalized it"? > Could you give the output of your procedure for the sample input data. > Val > > On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf wrote: > Dear all, > > I have an ndarray which consists of many arrays stacked behind each other (only conceptually, in truth it's a normal 1d float64 array). > I have a second array which tells me the start of the individual data sets in the 1d float64 array and another one which tells me the length. > Example: > > data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality [1,2,1,2,3,4,1,2,3, dtype=float64] > start_pointer = [0, 2, 6] > length_data = [2, 4, 3] > > I now want to normalize each of the individual data sets. I wrote a simple for loop over the start_pointer and length data grabbed the data and normalized it and wrote it back to the big array. That's slow. Is there an elegant numpy way to do that? Do I have to go the cython way? > > Cheers > Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From kalatsky at gmail.com Thu May 31 10:05:04 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Thu, 31 May 2012 09:05:04 -0500 Subject: [Numpy-discussion] fast access and normalizing of ndarray slices In-Reply-To: <63E36CAE-E232-4198-BA49-C70837E5FEB1@gmail.com> References: <5B532960-20F1-41FA-A17D-10CC5BEB898F@gmail.com> <63E36CAE-E232-4198-BA49-C70837E5FEB1@gmail.com> Message-ID: Hi Wolfgang, I thought maybe there is a trick for your specific operation. Your array stacking is a simple case of the group-by operation and normalization is aggregation followed by update. I believe group-by and aggregation are on the NumPy todo-list. You may have to write a small extension module to speed up your operations. Val On Thu, May 31, 2012 at 8:27 AM, Wolfgang Kerzendorf wrote: > Hey Val, > > Well it doesn't matter what I do, but specifically I do factor = > sum(data_array[start_point:start_point+length_data]) and then > data[array[start_point:start_point+length_data]) /= factor. and that for > every star_point and length data. > > How to do this fast? > > Cheers > Wolfgang > On 2012-05-31, at 1:43 AM, Val Kalatsky wrote: > > What do you mean by "normalized it"? > Could you give the output of your procedure for the sample input data. > Val > > On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf < > wkerzendorf at gmail.com> wrote: > >> Dear all, >> >> I have an ndarray which consists of many arrays stacked behind each other >> (only conceptually, in truth it's a normal 1d float64 array). >> I have a second array which tells me the start of the individual data >> sets in the 1d float64 array and another one which tells me the length. >> Example: >> >> data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality >> [1,2,1,2,3,4,1,2,3, dtype=float64] >> start_pointer = [0, 2, 6] >> length_data = [2, 4, 3] >> >> I now want to normalize each of the individual data sets. I wrote a >> simple for loop over the start_pointer and length data grabbed the data and >> normalized it and wrote it back to the big array. That's slow. Is there an >> elegant numpy way to do that? Do I have to go the cython way? >> >> Cheers >> Wolfgang >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Thu May 31 10:30:04 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 31 May 2012 10:30:04 -0400 Subject: [Numpy-discussion] slicing and aliasing Message-ID: Will copying slices always work correctly w/r to aliasing? That is, will: u[a:b] = u[c:d] always work (assuming the ranges of a:b, d:d are equal, or course) From kwgoodman at gmail.com Thu May 31 10:59:49 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 31 May 2012 07:59:49 -0700 Subject: [Numpy-discussion] slicing and aliasing In-Reply-To: References: Message-ID: On Thu, May 31, 2012 at 7:30 AM, Neal Becker wrote: > That is, will: > > u[a:b] = u[c:d] > > always work (assuming the ranges of a:b, d:d are equal, or course) It works most of the time. This thread shows you how to find an example where it does not work: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062079.html