[Pandas-dev] [EXTERNAL] Re: pandas or new project

John Eiler eiler13 at gmail.com
Wed Jan 23 12:44:26 EST 2019


Hi Dave, this seems like a really interesting project.  I'm curious to take
a look at it but the link isn't working for me.  Maybe you didn't make it
public?

On Sat, Jan 19, 2019 at 6:45 PM David M Rashty <David.Rashty at flagstar.com>
wrote:

> Tom/Wes,
>
> Here’s the open source project I started:
>
> https://github.com/pandichef/sugarbears
>
>
>
> It’s not quite ripe for the pandas ecosystem page, but I wanted to share
> what I’ve been working on and get your thoughts on the idea before I go far
> down the rabbit hole.
>
>
>
> At a high level, the goal is to wrap pandas in a way to enable comparable
> development speed to Stata or even MS Excel.
>
>
>
> Thanks!
>
> Dave
>
>
>
>
>
> *From:* Wes McKinney [mailto:wesmckinn at gmail.com]
> *Sent:* Thursday, September 13, 2018 9:56 PM
> *To:* Tom Augspurger <tom.augspurger88 at gmail.com>
> *Cc:* David M Rashty <David.Rashty at flagstar.com>; pandas-dev at python.org
> *Subject:* [EXTERNAL] Re: [Pandas-dev] pandas or new project
>
>
> *Flagstar Security Warning:* External Email. Please make sure you trust
> this source before clicking links or opening attachments.
>
> hi David,
>
>
>
> There's nothing really wrong with injecting a bunch of custom methods into
> the DataFrame.* namespace. If you wanted, you could release your package as
> like
>
>
>
> import pandas_stata
>
>
>
> and then the new methods would be available. This is pretty common in
> large corporate environments that use pandas AFAICT. You can also propose
> your changes in pull requests to pandas.
>
>
>
> - Wes
>
>
>
>
>
>
>
> On Thu, Sep 13, 2018 at 9:41 PM Tom Augspurger <tom.augspurger88 at gmail.com>
> wrote:
>
> With respect to your `sdrop` and `skeep`,  that's the goal of
> DataFrame.filter, though the name isn't the best so it'll
>
> maybe be deprecated in favor of something better.
>
>
>
> The rest sound interesting, but likely out of scope for pandas. If you
> build an open source library then we'd be
>
> happy to include in pandas' ecosystem page:
> http://pandas.pydata.org/pandas-docs/stable/ecosystem.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__pandas.pydata.org_pandas-2Ddocs_stable_ecosystem.html&d=DwMFaQ&c=6071WI5hme3qubAgsPInwSFFJUptGl1Ret_NIv4f0FM&r=IInR9ts5zJa2y9TCv1xkCBiNMNvWYuB88s6FL4QdKPQ&m=Yh52B0HOnjdaEtHlGjuSmivYPHIGG_RYsuh0b-93ELY&s=381O1pJzOg_Mvrmgl5CKUUTR9CSFh1VXi5zX4w33Kbc&e=>
>
>
>
> Tom
>
>
>
>
>
> On Thu, Sep 13, 2018 at 7:58 PM David M Rashty <David.Rashty at flagstar.com>
> wrote:
>
> Dear pandas team,
>
> I am a long time Stata user and I started using pandas about a year ago in
> order to build web applications using an in memory dataframe structure.  As
> a business user, I’ve found Stata to have a key advantage over pandas that
> many others have also noted: much faster development time.  Examples in
> Stata:
>
>
>
> drop myvar*       // drops all columns starting with myvar
>
> keep myvar*       // drops all columns except those starting with myvar
>
> reg z y x               // runs the regression z = a+bx+cy + error
>
>
>
> In order to use pandas in a Stata-like fashion, I’ve had to monkey patch
> large parts of the library e.g.,
>
>
>
> df = df.sdrop(‘myvar*’)     # same as above
>
> df = df.skeep(‘myvar*’)     # same as above
>
> df = df.sreg(‘z y x’)              # same as above
>
> df = df.squery(‘a>80 & b.str.contains(“hello”) & c.isin([1,2,3])’)   #
> df.query doesn’t support str.contains and isin to my knowledge
>
>
>
> I put an “s” in front of my methods to mean either “stata” or “sugar”.
>
>
>
> Additionally, I’ve built a system to:
>
> a)      Automatically load new DataFrame methods into memory (no
> additional imports required)
>
> b)      A caching system to make loading data blazing fast along with a
> much tighter syntax e.g., pd.read_stata(‘mydata.dta’) (6 secs load time) vs
> use.mydata (0.001 secs load time after the first read from file)
>
> c)      A system of column “labels” and formats to prettify various
> reports e.g., df.sscatter(‘rate score’) produces a scatter plot with labels
> “Interest Rate, %” and “Credit Score”, respectively.
>
> d)      A reactive web app (using Flask/Redis) to quickly view the full
> DataFrame content in a browser:
>
>
>
> Basically, I’ve tried to eliminate any obvious advantages Stata has over
> pandas.
>
>
>
> I’m potentially interested in developing this project into something
> bigger.   Would you like me to share my work in the context of pandas or
> should it be a completely separate project with a different scope?
>
>
>
> Thanks,
>
>
>
> David Rashty | Flagstar Bank | Whole Loan Trading | 248-312-6692 |
> david.rashty at flagstar.com
>
>
>
> This e-mail may contain data that is confidential, proprietary or
> non-public personal information, as that term is defined in the
> Gramm-Leach-Bliley Act (collectively, Confidential Information). The
> Confidential Information is disclosed conditioned upon your agreement that
> you will treat it confidentially and in accordance with applicable law,
> ensure that such data isn't used or disclosed except for the limited
> purpose for which it's being provided and will notify and cooperate with us
> regarding any requested or unauthorized disclosure or use of any
> Confidential Information.
> By accepting and reviewing the Confidential information, you agree to
> indemnify us against any losses or expenses, including attorney's fees that
> we may incur as a result of any unauthorized use or disclosure of this data
> due to your acts or omissions. If a party other than the intended recipient
> receives this e-mail, he or she is requested to instantly notify us of the
> erroneous delivery and return to us all data so delivered.
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_pandas-2Ddev&d=DwMFaQ&c=6071WI5hme3qubAgsPInwSFFJUptGl1Ret_NIv4f0FM&r=IInR9ts5zJa2y9TCv1xkCBiNMNvWYuB88s6FL4QdKPQ&m=Yh52B0HOnjdaEtHlGjuSmivYPHIGG_RYsuh0b-93ELY&s=bLEIk941oO-TPAw9RBlbPeNXj8CTho6oZ91eR_Q9jyI&e=>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_pandas-2Ddev&d=DwMFaQ&c=6071WI5hme3qubAgsPInwSFFJUptGl1Ret_NIv4f0FM&r=IInR9ts5zJa2y9TCv1xkCBiNMNvWYuB88s6FL4QdKPQ&m=Yh52B0HOnjdaEtHlGjuSmivYPHIGG_RYsuh0b-93ELY&s=bLEIk941oO-TPAw9RBlbPeNXj8CTho6oZ91eR_Q9jyI&e=>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190123/e18de6e5/attachment.html>


More information about the Pandas-dev mailing list