Re: [Pandas-dev] Proposal for consistent, clear copy/view semantics in pandas with Copy-on-Write
Tom Augspurger <tom.augspurger88@gmail.com> wrote: I wonder if we can validate what users (new and old) *actually* expect?
Users coming from R, which IIRC implements Copy on Write for matrices, might be OK with indexing always being (behaving like) a copy. I'm not sure what users coming from NumPy would expect, since I don't know how many NumPy users really understand *a**.)* when a NumPy slice is a view or copy, and *b.) *how a pandas indexing operation translates to a NumPy slice.
IMHO, we should concentrate on the "new" users. For my team, there is no numpy or R background. They learn pandas, and what pandas does needs to be really clear in behavior and documentation. I would also hazard a guess that most pandas users are like that - pandas is the first tool they see, not numpy or R. The places where I think confusion could happen are things like this with a DataFrame df : 1. s = df["a"] 2. s.iloc[3:5] = [1, 2, 3] 3. df["a"].iloc[3:5] = [1, 2, 3] 4. df["b"] = df["a"] 5. df["b"].iloc[3:5] = [4, 5, 6] 6. s2 = df["b"] 7. df["c"] = s2 8. s2.iloc[3:5] = [7, 8, 9] As I understand it (please correct me if I'm wrong), these lines would be interpreted as follows with the current proposal: 1. Creates a view into the DataFrame df. No copying is done at all 2. Modifies the series s and the underlying DataFrame df. (copy-on-write) 3. Modifies the dataframe 4. Copies the series from "a" to "b" 5. Modifies "b" in the DataFrame, but not "a" 6. Create a view into the DataFrame df. No copying is done at all. 7. Copies the series from "b" to "c" 8. Modifies s2, which modifies "b", but NOT "c" I think the challenge is explaining the sequence 6,7,8 above in comparison to the other sequences. -Irv
participants (1)
-
Irv Lustig