From bob.haffner at gmail.com Thu Feb 9 23:42:53 2017 From: bob.haffner at gmail.com (Bob Haffner) Date: Thu, 9 Feb 2017 22:42:53 -0600 Subject: [omaha] Group Data Science Competition In-Reply-To: References: <98FDF8B2-6371-4C4A-BA84-DD18AA7DC3A0@gmail.com> <63E88FA3-5AB5-4F8D-A610-2FE27F2AB772@yahoo.com> <12F96626-6D10-4B23-87F8-8980C5069E57@gmail.com> Message-ID: Added a Deep Learning section to my notebook https://github.com/bobhaffner/kaggle-houseprices/blob/master/kaggle_house_prices.ipynb Using Keras for the modeling with TensorFlow as the backend. I've generated a submission, but I don't know how it performed as Kaggle seems to be on the fritz tonight. On Sat, Jan 14, 2017 at 12:52 AM, Wes Turner wrote: > > > On Friday, January 13, 2017, Bob Haffner via Omaha > wrote: > >> Look at that. Two teams have submitted perfect scores :-) >> >> https://www.kaggle.com/c/house-prices-advanced-regression- >> techniques/leaderboard > > > https://www.kaggle.com/c/house-prices-advanced-regression-techniques/rules > > - Due to the public nature of the data, this competition does not > count towards Kaggle ranking points. > - We ask that you respect the spirit of the competition and do not > cheat. Hand-labeling is forbidden. > > > https://www.kaggle.com/wiki/ModelSubmissionBestPractices > > https://www.kaggle.com/wiki/WinningModelDocumentationTemplate (CNN, > XGBoost) > > Hopefully I can find some time to fix the data loading function in my > data.py and test w/ TPOT (manual sparse arrays), auto_ml, > > - https://www.coursera.org/learn/ml-foundations/lecture/ > 2HrHv/learning-a-simple-regression-model-to-predict- > house-prices-from-house-size (UW) > > - "Python Data Science Handbook" "This repository contains entire Python > Data Science Handbook , > in the form of (free!) Jupyter notebooks." > https://github.com/jakevdp/PythonDataScienceHandbook/ > blob/master/README.md#5-machine-learning (~UW) > > I'd also like to learn how to NN w/ tensors and Keras (Theano, TensorFlow) > https://github.com/fchollet/keras > > - https://keras.io/getting-started/faq/#how-can-i-record- > the-training-validation-loss-accuracy-at-each-epoch > > - http://machinelearningmastery.com/regression-tutorial-keras- > deep-learning-library-python/ > > >> On Thu, Jan 5, 2017 at 11:20 AM, Bob Haffner >> wrote: >> >> > Hi Travis, >> > >> > >> > >> > A few of us are doing the House Prices: Advanced Regression Techniques >> > competition >> > >> > https://www.kaggle.com/c/house-prices-advanced-regression-techniques >> > >> > >> > >> > Our team is called Omaha Pythonistas. You are more than welcome to join >> > us! Just let me know which email you use to sign up with on Kaggle and >> > I?ll send out an invite. >> > >> > >> > >> > We met in December and we hope to meet again soon. Most likely >> following >> > our monthly meeting on 1/18 >> > >> > >> > >> > Some our materials >> > >> > https://github.com/omahapython/kaggle-houseprices >> > >> > >> > >> > https://github.com/jeremy-doyle/home_price_kaggle >> > >> > >> > >> > https://github.com/bobhaffner/kaggle-houseprices >> > >> > On Wed, Jan 4, 2017 at 8:50 AM, Travis Smith via Omaha < >> omaha at python.org> >> > wrote: >> > >> >> Hey, new guy here. What's the challenge, exactly? I'm not a Kaggler >> yet, >> >> but I have taken some data science courses. >> >> >> >> -Travis >> >> >> >> > On Jan 4, 2017, at 7:57, Luke Schollmeyer via Omaha < >> omaha at python.org> >> >> wrote: >> >> > >> >> > I think there's two probable things: >> >> > 1. We're likely using some under-powered ML methods. Most of the >> Kaggle >> >> > interviews of the top guys/teams I read are using some much more >> >> advanced >> >> > methods to get their solutions into the top spots. I think what we're >> >> doing >> >> > is fine for what we want to accomplish. >> >> > 2. Feature engineering. Again, many of the interviews show that a >> ton of >> >> > work goes in to cleaning and conforming the data. >> >> > >> >> > I haven't back tracked any of the interviews to their submissions, >> so I >> >> > don't know how often they tend to submit, like tweak a small aspect >> and >> >> > keep honing that until it pays off. >> >> > >> >> > On Wed, Jan 4, 2017 at 7:43 AM, Bob Haffner via Omaha < >> omaha at python.org >> >> > >> >> > wrote: >> >> > >> >> >> Yeah, no kidding. That pdf wasn't hard to find and that #1 score is >> >> pretty >> >> >> damn good >> >> >> >> >> >> On Tue, Jan 3, 2017 at 10:41 PM, Jeremy Doyle via Omaha < >> >> omaha at python.org> >> >> >> wrote: >> >> >> >> >> >>> Looks like we have our key to a score of 0.0. Lol >> >> >>> >> >> >>> Seriously though, does anyone wonder if the person sitting at #1 >> had >> >> this >> >> >>> full data set as well and trained a model using the entire set? I >> mean >> >> >> that >> >> >>> 0.038 score is so much better than anyone else it seems a little >> >> >>> unrealistic...or maybe it's just seems that way because I haven't >> been >> >> >> able >> >> >>> to break through 0.12 : ) >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> Sent from my iPhone >> >> >>>>> On Jan 3, 2017, at 7:51 PM, Bob Haffner via Omaha < >> omaha at python.org >> >> > >> >> >>>> wrote: >> >> >>>> >> >> >>>> Pretty interesting notebook I put together regarding the kaggle >> comp >> >> >>>> https://github.com/bobhaffner/kaggle-houseprices/blob/ >> >> >>> master/additional_training_data.ipynb >> >> >>>> >> >> >>>> On Mon, Jan 2, 2017 at 12:10 AM, Wes Turner via Omaha < >> >> >> omaha at python.org> >> >> >>>> wrote: >> >> >>>> >> >> >>>>>> On Wednesday, December 28, 2016, Wes Turner < >> wes.turner at gmail.com> >> >> >>> wrote: >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> On Wed, Dec 28, 2016 at 12:56 AM, Jeremy Doyle via Omaha < >> >> >>>>> omaha at python.org >> >> >>>>>> > wrote: >> >> >>>>>> >> >> >>>>>>> Woohoo! We jumped 286 positions with a meager 0.00448 >> improvement >> >> in >> >> >>> our >> >> >>>>>>> score! Currently sitting at 798th place. >> >> >>>>>> >> >> >>>>>> Nice work! Features of your feature engineering I admire: >> >> >>>>>> >> >> >>>>>> - nominal, ordinal, continuous, discrete >> >> >>>>>> categorical = nominal + discrete >> >> >>>>>> numeric = continuous + discrete >> >> >>>>>> >> >> >>>>>> - outlier removal >> >> >>>>>> - [ ] w/ constant thresholding? (is there a distribution >> parameter) >> >> >>>>>> >> >> >>>>>> - building datestrings from SaleMonth and YrSold >> >> >>>>>> - SaleMonth / "1" / YrSold >> >> >>>>>> - df..drop(['MoSold','YrSold','SaleMonth']) >> >> >>>>>> - [ ] why drop SaleMonth? >> >> >>>>>> - [ ] pandas.to_datetime[df['SaleMonth']) >> >> >>>>>> >> >> >>>>>> - merging with FHA Home Price Index for the month and region >> ("West >> >> >>> North >> >> >>>>>> Central") >> >> >>>>>> https://www.fhfa.gov/DataTools/Downloads/Documents/ >> >> >>>>>> HPI/HPI_PO_monthly_hist.xls >> >> >>>>>> - [ ] pandas.to_datetime >> >> >>>>>> - this should have every month, but the new merge_asof >> feature is >> >> >>>>>> worth mentioning >> >> >>>>>> >> >> >>>>>> - manual binarization >> >> >>>>>> - [ ] how did you pick these? correlation after pd.get_dummies? >> >> >>>>>> - [ ] why floats? 1.0 / 1 (does it make a difference?) >> >> >>>>>> >> >> >>>>>> - Ames, IA nbrhood_multiplier >> >> >>>>>> - http://www.cityofames.org/home/showdocument?id=1024 >> >> >>>>>> >> >> >>>>>> - feature merging >> >> >>>>>> - BsmtFinSF = BsmtFinSF1 + BsmtFinSF2 >> >> >>>>>> - TotalBaths = BsmtFullBath + (BsmtHalfBath / 2.0) + FullBath + >> >> >>>>>> (HalfBath / 2.0) >> >> >>>>>> - ( ) IDK how a feature-selection pipeline could do this >> >> >> automatically >> >> >>>>>> >> >> >>>>>> - null value imputation >> >> >>>>>> - .isnull() = 0 >> >> >>>>>> - ( ) datacleaner incorrectly sets these to median or mode >> >> >>>>>> >> >> >>>>>> - log for skewed continuous and SalePrice >> >> >>>>>> - ( ) auto_ml: take_log_of_y does this for SalePrice >> >> >>>>>> >> >> >>>>>> - "Keeping only the columns we want" >> >> >>>>>> - [ ] 'Id' shouldn't be relevant (pd.read_csv(filename, >> >> >>> index_col='Id') >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> - Binarization >> >> >>>>>> - pd.get_dummies(dummy_na=False) >> >> >>>>>> - [ ] (a Luke pointed out, concatenation keeps the same columns) >> >> >>>>>> rows = eng_train.shape[0] >> >> >>>>>> eng_merged = pd.concat(eng_train, eng_test) >> >> >>>>>> onehot_merged = pd.get_dummies(eng_merged, >> columns=nominal, >> >> >>>>>> dummy_na=False) >> >> >>>>>> onehot_train = eng_merged[:rows] >> >> >>>>>> onehot_test = eng_merged[rows:] >> >> >>>>>> >> >> >>>>>> - class RandomSelectionHelper >> >> >>>>>> - [ ] this could be generally helpful in sklean[-pandas] >> >> >>>>>> - https://github.com/paulgb/sklearn-pandas#cross-validation >> >> >>>>>> >> >> >>>>>> - Models to Search >> >> >>>>>> - {Ridge, Lasso, ElasticNet} >> >> >>>>>> >> >> >>>>>> - https://github.com/ClimbsRocks/auto_ml/blob/ >> >> >>>>>> master/auto_ml/predictor.py#L222 >> >> >>>>>> _get_estimator_names ( "regressor" ) >> >> >>>>>> - {XGBRegessor, GradientBoostingRegressor, RANSACRegressor, >> >> >>>>>> RandomForestRegressor, LinearRegression, AdaBoostRegressor, >> >> >>>>>> ExtraTreesRegressor} >> >> >>>>>> >> >> >>>>>> - https://github.com/ClimbsRocks/auto_ml/blob/ >> >> >>>>>> master/auto_ml/predictor.py#L491 >> >> >>>>>> - (w/ ensembling) >> >> >>>>>> - ['RandomForestRegressor', 'LinearRegression', >> >> >>>>>> 'ExtraTreesRegressor', 'Ridge', 'GradientBoostingRegressor', >> >> >>>>>> 'AdaBoostRegressor', 'Lasso', 'ElasticNet', 'LassoLars', >> >> >>>>>> 'OrthogonalMatchingPursuit', 'BayesianRidge', 'SGDRegressor'] + >> [' >> >> >>>>>> XGBRegressor'] >> >> >>>>>> >> >> >>>>>> - model stacking / ensembling >> >> >>>>>> >> >> >>>>>> - ( ) auto_ml: https://auto-ml.readthedocs. >> >> >>>>> io/en/latest/ensembling.html >> >> >>>>>> - ( ) auto-sklearn: >> >> >>>>>> https://automl.github.io/auto-sklearn/stable/api.html# >> >> >>>>>> autosklearn.regression.AutoSklearnRegressor >> >> >>>>>> ensemble_size=50, ensemble_nbest=50 >> >> >>>>> >> >> >>>>> https://en.wikipedia.org/wiki/Ensemble_learning >> >> >>>>> >> >> >>>>> http://www.scholarpedia.org/article/Ensemble_learning# >> >> >>>>> Ensemble_combination_rules >> >> >>>>> >> >> >>>>> >> >> >>>>>> >> >> >>>>>> - submission['SalePrice'] = submission.SalePrice.apply(lambda >> x: >> >> >>>>>> np.exp(x)) >> >> >>>>>> >> >> >>>>>> - [ ] What is this called / how does this work? >> >> >>>>>> - https://docs.scipy.org/doc/numpy/reference/generated/ >> >> >>>>> numpy.exp.html >> >> >>>>>> >> >> >>>>>> - df.to_csv(filename, columns=['SalePrice'], index_label='Id') >> also >> >> >>> works >> >> >>>>>> - http://pandas.pydata.org/pandas-docs/stable/generated/ >> >> >>>>>> pandas.DataFrame.to_csv.html >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>>> My notebook is on GitHub for those interested: >> >> >>>>>>> >> >> >>>>>>> https://github.com/jeremy-doyle/home_price_kaggle/tree/ >> >> >>> master/attempt_4 >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> Thanks! >> >> >>>>> >> >> >>>>> (Trimmed for 40K limit) >> >> >>>>> _______________________________________________ >> >> >>>>> Omaha Python Users Group mailing list >> >> >>>>> Omaha at python.org >> >> >>>>> https://mail.python.org/mailman/listinfo/omaha >> >> >>>>> http://www.OmahaPython.org >> >> >>>> _______________________________________________ >> >> >>>> Omaha Python Users Group mailing list >> >> >>>> Omaha at python.org >> >> >>>> https://mail.python.org/mailman/listinfo/omaha >> >> >>>> http://www.OmahaPython.org >> >> >>> >> >> >>> _______________________________________________ >> >> >>> Omaha Python Users Group mailing list >> >> >>> Omaha at python.org >> >> >>> https://mail.python.org/mailman/listinfo/omaha >> >> >>> http://www.OmahaPython.org >> >> >> _______________________________________________ >> >> >> Omaha Python Users Group mailing list >> >> >> Omaha at python.org >> >> >> https://mail.python.org/mailman/listinfo/omaha >> >> >> http://www.OmahaPython.org >> >> > _______________________________________________ >> >> > Omaha Python Users Group mailing list >> >> > Omaha at python.org >> >> > https://mail.python.org/mailman/listinfo/omaha >> >> > http://www.OmahaPython.org >> >> _______________________________________________ >> >> Omaha Python Users Group mailing list >> >> Omaha at python.org >> >> https://mail.python.org/mailman/listinfo/omaha >> >> http://www.OmahaPython.org >> >> >> > >> > >> _______________________________________________ >> Omaha Python Users Group mailing list >> Omaha at python.org >> https://mail.python.org/mailman/listinfo/omaha >> http://www.OmahaPython.org > > From Becky_Brusky at unigroup.com Tue Feb 7 10:21:05 2017 From: Becky_Brusky at unigroup.com (Brusky, Becky) Date: Tue, 7 Feb 2017 15:21:05 +0000 Subject: [omaha] FW: Join us for a Python webinar, at INFORMS & at a training event In-Reply-To: <1752281265.1885054538.1486476195540.JavaMail.root@sjmas02.marketo.org> References: <1752281265.1885054538.1486476195540.JavaMail.root@sjmas02.marketo.org> Message-ID: Sharing a Python Gurobi webinar link with the Omaha group. Enjoy, Becky From: Gurobi Optimization > Reply-To: "romero at gurobi.com" > Date: Tuesday, February 7, 2017 at 8:03 AM To: "Brusky, Becky" > Subject: Join us for a Python webinar, at INFORMS & at a training event [Logo] UPCOMING GUROBI EVENTS Hello becky, This is to invite you to a number of exciting events coming up that you can take part in to learn more about Gurobi, modeling and optimization. Register for a series of Python webinars from beginning to advanced, attend this year's INFORMS Analytics conference and sign up for the next in-person Gurobi training event taking place in April in Chicago. We are looking forward to seeing you at these events! Sincerely, The Gurobi Team Free Python Webinar Series [Gurobi Webinar on Python] Python I: Intro to Python Modeling Feb. 28, 8:00 - 9:00 am (PST) This is the first webinar in a three-part series on Python, a powerful programming language that's also great for mathematical modeling. In this one-hour webinar, you will: * Get an introduction to Gurobi, Python and Jupyter Notebook * Learn the basics of model-building, including working with decision variables, constraints, sums and for-all loops * Learn through an interactive development process involving actual models as examples Register or learn more about the Python I webinar >> Learn more about the Python webinar series >> [X] INFORMS ANALYTICS CONFERENCE IN LAS VEGAS [http://pages.gurobi.com/rs/181-ZYS-005/images/INFORMS-logo.png] 2017 INFORMS Business Analytics Conference Las Vegas, NV, April 2-4 You'll be seeing a lot of Gurobi Optimization at this year's Business Analytics conference, with our pre-conference workshop, software tutorial, and exhibit booth. This is a great opportunity to meet and interact with the Gurobi team, hear a case study on combining predictive and prescriptive models in Gurobi, and learn about the significant performance enhancements and new features in our latest V7.0 release. Visit our INFORMS page to learn more >> [X] FREE IN-PERSON, HANDS-ON TRAINING EVENT [http://pages.gurobi.com/rs/181-ZYS-005/images/gurobi-training-logo-v4.jpg] 2 days of training, 1:1 sessions & catered reception Chicago, IL, April 25 and 26 We are once again offering our popular, free of charge Gurobi training event for commercial users of optimization software. This hands-on and interactive training is a great opportunity to learn directly from the experts at Gurobi and also hear how other users of Gurobi have applied optimization in their businesses. Visit our Training Event page to learn more >> [X] Contact Gurobi Email: info at gurobi.com Phone No. : +1 713 871 9341 [twitter_icon][linkedin_icon][youtube_icon] Copyright 2017 Gurobi. All rights reserved. Unsubscribe ######################################################################## The information contained in this message, and any attachments thereto, is intended solely for the use of the addressee(s) and may contain confidential and/or privileged material. Any review, retransmission, dissemination, copying, or other use of the transmitted information is prohibited. If you received this in error, please contact the sender and delete the material from any computer. UNIGROUP.COM ######################################################################## From wes.turner at gmail.com Fri Feb 10 13:46:37 2017 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 10 Feb 2017 12:46:37 -0600 Subject: [omaha] January 2017 In-Reply-To: References: <417D903E-B074-4E4D-A42C-99EC9BE75BAA@seenaomi.net> Message-ID: Here's that sudoku solver I couldn't remember the link to: https://github.com/ContinuumIO/pycosat/blob/master/examples/sudoku.py ( https://www.continuum.io/blog/developer/sat-based-sudoku-solver-python ) pycosat is a SAT solver. conda uses pycosat for dependency resolution every time you 'conda install'. And the K12 CS link is not k12csframework .org but https://k12cs.org/ On Tue, Oct 11, 2016 at 12:54 PM, Wes Turner wrote: > You're welcome. Just thought I'd share > > > On Tuesday, October 11, 2016, Seenaomi wrote: > >> Wow!! Thank you! >> >> Naomi- >> >> >> Thank you for your time! >> >> > On Oct 10, 2016, at 11:46 PM, Wes Turner via Omaha >> wrote: >> > >> > Additional linear algebra resources from/for >> > https://wrdrd.com/docs/consulting/data-science#linear-algebra : >> > >> > Linear Algebra >> > ~~~~~~~~~~~~~~~~https://en.wikipedia.org/wiki/Linear_algebra >> > >> > * https://www.khanacademy.org/math/linear-algebra >> > * http://www.ulaff.net/ >> > * https://github.com/ULAFF/notebooks/ >> > >> > - these are Jupyter (numpy) notebooks from the 2014 ULAFF >> > schema.org/CourseInstance ; the latest version of the schema:Course is >> > now in Matlab (with a PDF textbook) >> > >> > - "Linear Algebra - Foundations to Frontiers" >> > >> > >> > - http://docs.scipy.org/doc/numpy/reference/routines.linalg.html >> > - https://westurner.org/redditlog/#comment/chxusn9 >> > , - http://www.scipy-lectures.org/intro/numpy/operations.html#br >> oadcasting >> > (seeAlso Theano Tensor broadcasting) >> > >> > - http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html >> (numpy / >> > scipy) >> > >> > - http://www.scipy-lectures.org/advanced/scipy_sparse/solvers.html >> > >> > - http://docs.sympy.org/dev/modules/matrices/matrices.html (symbols) >> > >> > - >> > https://en.wikipedia.org/wiki/SageMath#Software_packages_con >> tained_in_SageMath >> > - https://wiki.sagemath.org/quickref >> > - http://doc.sagemath.org/ >> > - http://doc.sagemath.org/html/en/reference/index.html#linear- >> algebra >> > - https://cloud.sagemath.com now includes Jupyter Notebook >> > >> > >> > - >> > http://scikit-learn.org/stable/modules/computational_perform >> ance.html#linear-algebra-libraries >> > - BLAS >> > - LAPACK >> > - Intel MKL >> > - >> > https://www.continuum.io/blog/developer-blog/anaconda-25-rel >> ease-now-mkl-optimizations >> > (2016; Anaconda 2.5) >> > >> > - >> > http://scikit-learn.org/stable/developers/utilities.html# >> efficient-linear-algebra-array-operations >> > >> > - https://en.m.wikipedia.org/wiki/Tensor >> > - >> > http://deeplearning.net/software/theano/tutorial/adding. >> html#adding-two-matrices >> > - >> > http://deeplearning.net/software/theano/library/tensor/ >> basic.html#libdoc-tensor-broadcastable >> > - >> > https://github.com/donnemartin/data-science-ipython- >> notebooks#theano-tutorials >> > >> > - https://github.com/jtoy/awesome-tensorflow >> > - >> > https://github.com/tensorflow/tensorflow/tree/master/tensorf >> low/examples/skflow >> > - >> > https://github.com/donnemartin/data-science-ipython- >> notebooks#tensor-flow-tutorials >> > >> > >> > On Monday, September 26, 2016, Matt Payne via Omaha >> > wrote: >> > >> >> Thanks, I'll check it out. >> >> On Mon, Sep 26, 2016 at 3:54 PM Brusky, Becky < >> Becky_Brusky at unigroup.com >> >> > >> >> wrote: >> >> >> >>> There is a free academic version that requires it to be downloaded >> while >> >>> logged into a university network. There is also an evaluation license >> >> that >> >>> is free to start with. >> >>> >> >>> >> >>> http://www.gurobi.com/downloads/download-center?campaignid=193283256& >> >> adgroupid=8992998816&creative=49628571816&keyword=%2Bgurobi% >> >> 20%2Blicense&matchtype=b&gclid=CParu4_yrc8CFRAzaQode1YP4w >> >>> >> >>> Note: If you are on a Mac, use Python 2.7. >> >>> >> >>> From: Matt Payne > >> >>> Date: Monday, September 26, 2016 at 3:24 PM >> >>> To: Omaha Python Users Group > >> >>> Cc: "Brusky, Becky" > >> >>> Subject: Re: [omaha] January 2017 >> >>> >> >>> Yes, please! I would love to attend a presentation on linear >> >>> programming! Does Gurobi have a free hobbyist edition? thanks! >> --Matt >> >>> Payne >> >>> >> >>> >> >>> On Mon, Sep 26, 2016 at 2:50 PM Brusky, Becky via Omaha < >> >> omaha at python.org > >> >>> wrote: >> >>> >> >>>> Steve, >> >>>> >> >>>> Bob Haffner mentioned that you are looking for presenters. >> >>>> >> >>>> I could do a linear programming with Python and Gurobi presentation. >> >> I'm >> >>>> teaching a Mon/Wed class fall quarter, so was thinking December or >> >> January >> >>>> could work. >> >>>> >> >>>> Becky Brusky >> >>>> Operation Research Developer >> >>>> Cell: 402.312.8612 <(402)%20312-8612> >> >>>> Email: becky_brusky at unigroup.com >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> ############################################################ >> >> ############ >> >>>> The information contained in this message, and any attachments >> thereto, >> >>>> is intended solely for the use of the addressee(s) and may contain >> >>>> confidential and/or privileged material. Any review, retransmission, >> >>>> dissemination, copying, or other use of the transmitted information >> is >> >>>> prohibited. If you received this in error, please contact the sender >> >>>> and delete the material from any computer. UNIGROUP.COM >> >>>> ############################################################ >> >> ############ >> >>>> >> >>>> _______________________________________________ >> >>>> Omaha Python Users Group mailing list >> >>>> Omaha at python.org >> >>>> https://mail.python.org/mailman/listinfo/omaha >> >>>> http://www.OmahaPython.org >> >>>> >> >>> >> >>> ############################################################ >> ############ >> >>> The information contained in this message, and any attachments >> thereto, >> >>> is intended solely for the use of the addressee(s) and may contain >> >>> confidential and/or privileged material. Any review, retransmission, >> >>> dissemination, copying, or other use of the transmitted information is >> >>> prohibited. If you received this in error, please contact the sender >> >>> and delete the material from any computer. UNIGROUP.COM >> >>> ############################################################ >> ############ >> >>> >> >>> >> >> _______________________________________________ >> >> Omaha Python Users Group mailing list >> >> Omaha at python.org >> >> https://mail.python.org/mailman/listinfo/omaha >> >> http://www.OmahaPython.org >> >> >> > _______________________________________________ >> > Omaha Python Users Group mailing list >> > Omaha at python.org >> > https://mail.python.org/mailman/listinfo/omaha >> > http://www.OmahaPython.org >> > From wereapwhatwesow at gmail.com Fri Feb 10 16:12:09 2017 From: wereapwhatwesow at gmail.com (Steve Young) Date: Fri, 10 Feb 2017 15:12:09 -0600 Subject: [omaha] Beginner question on the web site / meeting next week Message-ID: A few weeks ago we added a Q&A section to the website, as a place to have conversations outside of the mailing list. We have a question there now if anyone would like to offer some advice: http://www.omahapython.org/blog/question/extract-5-digit-zipcode-numbers-from-tweets And next wednesday is the monthly meeting: http://www.omahapython.org/blog/archives/event/building-the-bricklayer-ide?instance_id=35 Thanks. Steve From wes.turner at gmail.com Fri Feb 10 11:13:10 2017 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 10 Feb 2017 10:13:10 -0600 Subject: [omaha] Group Data Science Competition In-Reply-To: References: <98FDF8B2-6371-4C4A-BA84-DD18AA7DC3A0@gmail.com> <63E88FA3-5AB5-4F8D-A610-2FE27F2AB772@yahoo.com> <12F96626-6D10-4B23-87F8-8980C5069E57@gmail.com> Message-ID: Anyone have a good way to re.find all the links in this thread? - [ ] linkgrep(thread) >> wiki - [ ] http://www.datatau.com is like news.ycombinator.com for Data Science @bob Cool. What is the (MSE, $ deviance) with Keras? Keras with Theano or TensorFlow? On Thursday, February 9, 2017, Bob Haffner via Omaha wrote: > Added a Deep Learning section to my notebook > https://github.com/bobhaffner/kaggle-houseprices/blob/ > master/kaggle_house_prices.ipynb > > Using Keras for the modeling with TensorFlow as the backend. > > I've generated a submission, but I don't know how it performed as Kaggle > seems to be on the fritz tonight. > > On Sat, Jan 14, 2017 at 12:52 AM, Wes Turner > wrote: > > > > > > > On Friday, January 13, 2017, Bob Haffner via Omaha > > > wrote: > > > >> Look at that. Two teams have submitted perfect scores :-) > >> > >> https://www.kaggle.com/c/house-prices-advanced-regression- > >> techniques/leaderboard > > > > > > https://www.kaggle.com/c/house-prices-advanced- > regression-techniques/rules > > > > - Due to the public nature of the data, this competition does not > > count towards Kaggle ranking points. > > - We ask that you respect the spirit of the competition and do not > > cheat. Hand-labeling is forbidden. > > > > > > https://www.kaggle.com/wiki/ModelSubmissionBestPractices > > > > https://www.kaggle.com/wiki/WinningModelDocumentationTemplate (CNN, > > XGBoost) > > > > Hopefully I can find some time to fix the data loading function in my > > data.py and test w/ TPOT (manual sparse arrays), auto_ml, > > > > - https://www.coursera.org/learn/ml-foundations/lecture/ > > 2HrHv/learning-a-simple-regression-model-to-predict- > > house-prices-from-house-size (UW) > > > > - "Python Data Science Handbook" "This repository contains entire Python > > Data Science Handbook >, > > in the form of (free!) Jupyter notebooks." > > https://github.com/jakevdp/PythonDataScienceHandbook/ > > blob/master/README.md#5-machine-learning (~UW) > > > > I'd also like to learn how to NN w/ tensors and Keras (Theano, > TensorFlow) > > https://github.com/fchollet/keras > > > > - https://keras.io/getting-started/faq/#how-can-i-record- > > the-training-validation-loss-accuracy-at-each-epoch > > > > - http://machinelearningmastery.com/regression-tutorial-keras- > > deep-learning-library-python/ > > > > > >> On Thu, Jan 5, 2017 at 11:20 AM, Bob Haffner > > >> wrote: > >> > >> > Hi Travis, > >> > > >> > > >> > > >> > A few of us are doing the House Prices: Advanced Regression Techniques > >> > competition > >> > > >> > https://www.kaggle.com/c/house-prices-advanced-regression-techniques > >> > > >> > > >> > > >> > Our team is called Omaha Pythonistas. You are more than welcome to > join > >> > us! Just let me know which email you use to sign up with on Kaggle > and > >> > I?ll send out an invite. > >> > > >> > > >> > > >> > We met in December and we hope to meet again soon. Most likely > >> following > >> > our monthly meeting on 1/18 > >> > > >> > > >> > > >> > Some our materials > >> > > >> > https://github.com/omahapython/kaggle-houseprices > >> > > >> > > >> > > >> > https://github.com/jeremy-doyle/home_price_kaggle > >> > > >> > > >> > > >> > https://github.com/bobhaffner/kaggle-houseprices > >> > > >> > On Wed, Jan 4, 2017 at 8:50 AM, Travis Smith via Omaha < > >> omaha at python.org > > >> > wrote: > >> > > >> >> Hey, new guy here. What's the challenge, exactly? I'm not a Kaggler > >> yet, > >> >> but I have taken some data science courses. > >> >> > >> >> -Travis > >> >> > >> >> > On Jan 4, 2017, at 7:57, Luke Schollmeyer via Omaha < > >> omaha at python.org > > >> >> wrote: > >> >> > > >> >> > I think there's two probable things: > >> >> > 1. We're likely using some under-powered ML methods. Most of the > >> Kaggle > >> >> > interviews of the top guys/teams I read are using some much more > >> >> advanced > >> >> > methods to get their solutions into the top spots. I think what > we're > >> >> doing > >> >> > is fine for what we want to accomplish. > >> >> > 2. Feature engineering. Again, many of the interviews show that a > >> ton of > >> >> > work goes in to cleaning and conforming the data. > >> >> > > >> >> > I haven't back tracked any of the interviews to their submissions, > >> so I > >> >> > don't know how often they tend to submit, like tweak a small aspect > >> and > >> >> > keep honing that until it pays off. > >> >> > > >> >> > On Wed, Jan 4, 2017 at 7:43 AM, Bob Haffner via Omaha < > >> omaha at python.org > >> >> > > >> >> > wrote: > >> >> > > >> >> >> Yeah, no kidding. That pdf wasn't hard to find and that #1 score > is > >> >> pretty > >> >> >> damn good > >> >> >> > >> >> >> On Tue, Jan 3, 2017 at 10:41 PM, Jeremy Doyle via Omaha < > >> >> omaha at python.org > > >> >> >> wrote: > >> >> >> > >> >> >>> Looks like we have our key to a score of 0.0. Lol > >> >> >>> > >> >> >>> Seriously though, does anyone wonder if the person sitting at #1 > >> had > >> >> this > >> >> >>> full data set as well and trained a model using the entire set? I > >> mean > >> >> >> that > >> >> >>> 0.038 score is so much better than anyone else it seems a little > >> >> >>> unrealistic...or maybe it's just seems that way because I haven't > >> been > >> >> >> able > >> >> >>> to break through 0.12 : ) > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> Sent from my iPhone > >> >> >>>>> On Jan 3, 2017, at 7:51 PM, Bob Haffner via Omaha < > >> omaha at python.org > >> >> > > >> >> >>>> wrote: > >> >> >>>> > >> >> >>>> Pretty interesting notebook I put together regarding the kaggle > >> comp > >> >> >>>> https://github.com/bobhaffner/kaggle-houseprices/blob/ > >> >> >>> master/additional_training_data.ipynb > >> >> >>>> > >> >> >>>> On Mon, Jan 2, 2017 at 12:10 AM, Wes Turner via Omaha < > >> >> >> omaha at python.org > > >> >> >>>> wrote: > >> >> >>>> > >> >> >>>>>> On Wednesday, December 28, 2016, Wes Turner < > >> wes.turner at gmail.com > > >> >> >>> wrote: > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> On Wed, Dec 28, 2016 at 12:56 AM, Jeremy Doyle via Omaha < > >> >> >>>>> omaha at python.org > >> >> >>>>>> ');>> > wrote: > >> >> >>>>>> > >> >> >>>>>>> Woohoo! We jumped 286 positions with a meager 0.00448 > >> improvement > >> >> in > >> >> >>> our > >> >> >>>>>>> score! Currently sitting at 798th place. > >> >> >>>>>> > >> >> >>>>>> Nice work! Features of your feature engineering I admire: > >> >> >>>>>> > >> >> >>>>>> - nominal, ordinal, continuous, discrete > >> >> >>>>>> categorical = nominal + discrete > >> >> >>>>>> numeric = continuous + discrete > >> >> >>>>>> > >> >> >>>>>> - outlier removal > >> >> >>>>>> - [ ] w/ constant thresholding? (is there a distribution > >> parameter) > >> >> >>>>>> > >> >> >>>>>> - building datestrings from SaleMonth and YrSold > >> >> >>>>>> - SaleMonth / "1" / YrSold > >> >> >>>>>> - df..drop(['MoSold','YrSold','SaleMonth']) > >> >> >>>>>> - [ ] why drop SaleMonth? > >> >> >>>>>> - [ ] pandas.to_datetime[df['SaleMonth']) > >> >> >>>>>> > >> >> >>>>>> - merging with FHA Home Price Index for the month and region > >> ("West > >> >> >>> North > >> >> >>>>>> Central") > >> >> >>>>>> https://www.fhfa.gov/DataTools/Downloads/Documents/ > >> >> >>>>>> HPI/HPI_PO_monthly_hist.xls > >> >> >>>>>> - [ ] pandas.to_datetime > >> >> >>>>>> - this should have every month, but the new merge_asof > >> feature is > >> >> >>>>>> worth mentioning > >> >> >>>>>> > >> >> >>>>>> - manual binarization > >> >> >>>>>> - [ ] how did you pick these? correlation after > pd.get_dummies? > >> >> >>>>>> - [ ] why floats? 1.0 / 1 (does it make a difference?) > >> >> >>>>>> > >> >> >>>>>> - Ames, IA nbrhood_multiplier > >> >> >>>>>> - http://www.cityofames.org/home/showdocument?id=1024 > >> >> >>>>>> > >> >> >>>>>> - feature merging > >> >> >>>>>> - BsmtFinSF = BsmtFinSF1 + BsmtFinSF2 > >> >> >>>>>> - TotalBaths = BsmtFullBath + (BsmtHalfBath / 2.0) + FullBath > + > >> >> >>>>>> (HalfBath / 2.0) > >> >> >>>>>> - ( ) IDK how a feature-selection pipeline could do this > >> >> >> automatically > >> >> >>>>>> > >> >> >>>>>> - null value imputation > >> >> >>>>>> - .isnull() = 0 > >> >> >>>>>> - ( ) datacleaner incorrectly sets these to median or mode > >> >> >>>>>> > >> >> >>>>>> - log for skewed continuous and SalePrice > >> >> >>>>>> - ( ) auto_ml: take_log_of_y does this for SalePrice > >> >> >>>>>> > >> >> >>>>>> - "Keeping only the columns we want" > >> >> >>>>>> - [ ] 'Id' shouldn't be relevant (pd.read_csv(filename, > >> >> >>> index_col='Id') > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> - Binarization > >> >> >>>>>> - pd.get_dummies(dummy_na=False) > >> >> >>>>>> - [ ] (a Luke pointed out, concatenation keeps the same > columns) > >> >> >>>>>> rows = eng_train.shape[0] > >> >> >>>>>> eng_merged = pd.concat(eng_train, eng_test) > >> >> >>>>>> onehot_merged = pd.get_dummies(eng_merged, > >> columns=nominal, > >> >> >>>>>> dummy_na=False) > >> >> >>>>>> onehot_train = eng_merged[:rows] > >> >> >>>>>> onehot_test = eng_merged[rows:] > >> >> >>>>>> > >> >> >>>>>> - class RandomSelectionHelper > >> >> >>>>>> - [ ] this could be generally helpful in sklean[-pandas] > >> >> >>>>>> - https://github.com/paulgb/sklearn-pandas#cross-validation > >> >> >>>>>> > >> >> >>>>>> - Models to Search > >> >> >>>>>> - {Ridge, Lasso, ElasticNet} > >> >> >>>>>> > >> >> >>>>>> - https://github.com/ClimbsRocks/auto_ml/blob/ > >> >> >>>>>> master/auto_ml/predictor.py#L222 > >> >> >>>>>> _get_estimator_names ( "regressor" ) > >> >> >>>>>> - {XGBRegessor, GradientBoostingRegressor, > RANSACRegressor, > >> >> >>>>>> RandomForestRegressor, LinearRegression, AdaBoostRegressor, > >> >> >>>>>> ExtraTreesRegressor} > >> >> >>>>>> > >> >> >>>>>> - https://github.com/ClimbsRocks/auto_ml/blob/ > >> >> >>>>>> master/auto_ml/predictor.py#L491 > >> >> >>>>>> - (w/ ensembling) > >> >> >>>>>> - ['RandomForestRegressor', 'LinearRegression', > >> >> >>>>>> 'ExtraTreesRegressor', 'Ridge', 'GradientBoostingRegressor', > >> >> >>>>>> 'AdaBoostRegressor', 'Lasso', 'ElasticNet', 'LassoLars', > >> >> >>>>>> 'OrthogonalMatchingPursuit', 'BayesianRidge', 'SGDRegressor'] > + > >> [' > >> >> >>>>>> XGBRegressor'] > >> >> >>>>>> > >> >> >>>>>> - model stacking / ensembling > >> >> >>>>>> > >> >> >>>>>> - ( ) auto_ml: https://auto-ml.readthedocs. > >> >> >>>>> io/en/latest/ensembling.html > >> >> >>>>>> - ( ) auto-sklearn: > >> >> >>>>>> https://automl.github.io/auto-sklearn/stable/api.html# > >> >> >>>>>> autosklearn.regression.AutoSklearnRegressor > >> >> >>>>>> ensemble_size=50, ensemble_nbest=50 > >> >> >>>>> > >> >> >>>>> https://en.wikipedia.org/wiki/Ensemble_learning > >> >> >>>>> > >> >> >>>>> http://www.scholarpedia.org/article/Ensemble_learning# > >> >> >>>>> Ensemble_combination_rules > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>> > >> >> >>>>>> - submission['SalePrice'] = submission.SalePrice.apply(lambda > >> x: > >> >> >>>>>> np.exp(x)) > >> >> >>>>>> > >> >> >>>>>> - [ ] What is this called / how does this work? > >> >> >>>>>> - https://docs.scipy.org/doc/numpy/reference/generated/ > >> >> >>>>> numpy.exp.html > >> >> >>>>>> > >> >> >>>>>> - df.to_csv(filename, columns=['SalePrice'], index_label='Id') > >> also > >> >> >>> works > >> >> >>>>>> - http://pandas.pydata.org/pandas-docs/stable/generated/ > >> >> >>>>>> pandas.DataFrame.to_csv.html > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>>> My notebook is on GitHub for those interested: > >> >> >>>>>>> > >> >> >>>>>>> https://github.com/jeremy-doyle/home_price_kaggle/tree/ > >> >> >>> master/attempt_4 > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> Thanks! > >> >> >>>>> > >> >> >>>>> (Trimmed for 40K limit) > >> >> >>>>> _______________________________________________ > >> >> >>>>> Omaha Python Users Group mailing list > >> >> >>>>> Omaha at python.org > >> >> >>>>> https://mail.python.org/mailman/listinfo/omaha > >> >> >>>>> http://www.OmahaPython.org > >> >> >>>> _______________________________________________ > >> >> >>>> Omaha Python Users Group mailing list > >> >> >>>> Omaha at python.org > >> >> >>>> https://mail.python.org/mailman/listinfo/omaha > >> >> >>>> http://www.OmahaPython.org > >> >> >>> > >> >> >>> _______________________________________________ > >> >> >>> Omaha Python Users Group mailing list > >> >> >>> Omaha at python.org > >> >> >>> https://mail.python.org/mailman/listinfo/omaha > >> >> >>> http://www.OmahaPython.org > >> >> >> _______________________________________________ > >> >> >> Omaha Python Users Group mailing list > >> >> >> Omaha at python.org > >> >> >> https://mail.python.org/mailman/listinfo/omaha > >> >> >> http://www.OmahaPython.org > >> >> > _______________________________________________ > >> >> > Omaha Python Users Group mailing list > >> >> > Omaha at python.org > >> >> > https://mail.python.org/mailman/listinfo/omaha > >> >> > http://www.OmahaPython.org > >> >> _______________________________________________ > >> >> Omaha Python Users Group mailing list > >> >> Omaha at python.org > >> >> https://mail.python.org/mailman/listinfo/omaha > >> >> http://www.OmahaPython.org > >> >> > >> > > >> > > >> _______________________________________________ > >> Omaha Python Users Group mailing list > >> Omaha at python.org > >> https://mail.python.org/mailman/listinfo/omaha > >> http://www.OmahaPython.org > > > > > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org From wes.turner at gmail.com Fri Feb 10 11:35:44 2017 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 10 Feb 2017 10:35:44 -0600 Subject: [omaha] REQ: Regex URIs from list thread Message-ID: Solutions for this opportunity would be helpful https://westurner.org/wiki/ideas#open-source-mailing-list-extractor .On Friday, February 10, 2017, Wes Turner wrote: > Anyone have a good way to re.find all the links in this thread? > > - [ ] linkgrep(thread) >> wiki > A few ways to re.find all the links in this thread: Email: Expand All > Save as .Html, BeautifulSoup Browser: Requirify w/ jQuery each page WebExtension browser extension JS regex for URIs POST(doc.contents) to API with TOKEN POST(window.URI) to API with TOKEN class URIGrepHandler(APIHandler): def post(self, req): def get_data_and_transform(): Crawl with Selenium Crawl with Splinter Crawl with PhantomJS, CasperJS (seeAlso: Splinter) Retrieve and parse list archive/spool (large, slow) BeautifulSoup Bleach.linkify https://github.com/mozilla/bleach/blob/master/bleach/__init__.py#L141 linkify() html5lib.HTMLParser https://github.com/mozilla/bleach/blob/master/tests/test_links.py linkify() tests Email: Retrieve thread with IMAPS (delegation) URI / URL regexes - https://pypi.python.org/pypi/rfc3986 - https://pypi.python.org/pypi/rfc3987 - what about a http://trailing/paren?) - linkify? ... https://westurner.org/wiki/ideas#open-source-mailing-list-extractor - [ ] http://www.datatau.com is like news.ycombinator.com for Data Science > > @bob > Cool. > What is the (MSE, $ deviance) with Keras? > > Keras with Theano or TensorFlow? > > On Thursday, February 9, 2017, Bob Haffner via Omaha > wrote: > >> Added a Deep Learning section to my notebook >> https://github.com/bobhaffner/kaggle-houseprices/blob/master >> /kaggle_house_prices.ipynb >> >> Using Keras for the modeling with TensorFlow as the backend. >> >> I've generated a submission, but I don't know how it performed as Kaggle >> seems to be on the fritz tonight. >> >> On Sat, Jan 14, 2017 at 12:52 AM, Wes Turner >> wrote: >> >> > >> > >> > On Friday, January 13, 2017, Bob Haffner via Omaha >> > wrote: >> > >> >> Look at that. Two teams have submitted perfect scores :-) >> >> >> >> https://www.kaggle.com/c/house-prices-advanced-regression- >> >> techniques/leaderboard >> > >> > >> > https://www.kaggle.com/c/house-prices-advanced-regression- >> techniques/rules >> > >> > - Due to the public nature of the data, this competition does not >> > count towards Kaggle ranking points. >> > - We ask that you respect the spirit of the competition and do not >> > cheat. Hand-labeling is forbidden. >> > >> > >> > https://www.kaggle.com/wiki/ModelSubmissionBestPractices >> > >> > https://www.kaggle.com/wiki/WinningModelDocumentationTemplate (CNN, >> > XGBoost) >> > >> > Hopefully I can find some time to fix the data loading function in my >> > data.py and test w/ TPOT (manual sparse arrays), auto_ml, >> > >> > - https://www.coursera.org/learn/ml-foundations/lecture/ >> > 2HrHv/learning-a-simple-regression-model-to-predict- >> > house-prices-from-house-size (UW) >> > >> > - "Python Data Science Handbook" "This repository contains entire Python >> > Data Science Handbook > >, >> > in the form of (free!) Jupyter notebooks." >> > https://github.com/jakevdp/PythonDataScienceHandbook/ >> > blob/master/README.md#5-machine-learning (~UW) >> > >> > I'd also like to learn how to NN w/ tensors and Keras (Theano, >> TensorFlow) >> > https://github.com/fchollet/keras >> > >> > - https://keras.io/getting-started/faq/#how-can-i-record- >> > the-training-validation-loss-accuracy-at-each-epoch >> > >> > - http://machinelearningmastery.com/regression-tutorial-keras- >> > deep-learning-library-python/ >> > >> > >> >> On Thu, Jan 5, 2017 at 11:20 AM, Bob Haffner >> >> wrote: >> >> >> >> > Hi Travis, >> >> > >> >> > >> >> > >> >> > A few of us are doing the House Prices: Advanced Regression >> Techniques >> >> > competition >> >> > >> >> > https://www.kaggle.com/c/house-prices-advanced-regression-techniques >> >> > >> >> > >> >> > >> >> > Our team is called Omaha Pythonistas. You are more than welcome to >> join >> >> > us! Just let me know which email you use to sign up with on Kaggle >> and >> >> > I?ll send out an invite. >> >> > >> >> > >> >> > >> >> > We met in December and we hope to meet again soon. Most likely >> >> following >> >> > our monthly meeting on 1/18 >> >> > >> >> > >> >> > >> >> > Some our materials >> >> > >> >> > https://github.com/omahapython/kaggle-houseprices >> >> > >> >> > >> >> > >> >> > https://github.com/jeremy-doyle/home_price_kaggle >> >> > >> >> > >> >> > >> >> > https://github.com/bobhaffner/kaggle-houseprices >> >> > >> >> > On Wed, Jan 4, 2017 at 8:50 AM, Travis Smith via Omaha < >> >> omaha at python.org> >> >> > wrote: >> >> > >> >> >> Hey, new guy here. What's the challenge, exactly? I'm not a Kaggler >> >> yet, >> >> >> but I have taken some data science courses. >> >> >> >> >> >> -Travis >> >> >> >> >> >> > On Jan 4, 2017, at 7:57, Luke Schollmeyer via Omaha < >> >> omaha at python.org> >> >> >> wrote: >> >> >> > >> >> >> > I think there's two probable things: >> >> >> > 1. We're likely using some under-powered ML methods. Most of the >> >> Kaggle >> >> >> > interviews of the top guys/teams I read are using some much more >> >> >> advanced >> >> >> > methods to get their solutions into the top spots. I think what >> we're >> >> >> doing >> >> >> > is fine for what we want to accomplish. >> >> >> > 2. Feature engineering. Again, many of the interviews show that a >> >> ton of >> >> >> > work goes in to cleaning and conforming the data. >> >> >> > >> >> >> > I haven't back tracked any of the interviews to their submissions, >> >> so I >> >> >> > don't know how often they tend to submit, like tweak a small >> aspect >> >> and >> >> >> > keep honing that until it pays off. >> >> >> > >> >> >> > On Wed, Jan 4, 2017 at 7:43 AM, Bob Haffner via Omaha < >> >> omaha at python.org >> >> >> > >> >> >> > wrote: >> >> >> > >> >> >> >> Yeah, no kidding. That pdf wasn't hard to find and that #1 >> score is >> >> >> pretty >> >> >> >> damn good >> >> >> >> >> >> >> >> On Tue, Jan 3, 2017 at 10:41 PM, Jeremy Doyle via Omaha < >> >> >> omaha at python.org> >> >> >> >> wrote: >> >> >> >> >> >> >> >>> Looks like we have our key to a score of 0.0. Lol >> >> >> >>> >> >> >> >>> Seriously though, does anyone wonder if the person sitting at #1 >> >> had >> >> >> this >> >> >> >>> full data set as well and trained a model using the entire set? >> I >> >> mean >> >> >> >> that >> >> >> >>> 0.038 score is so much better than anyone else it seems a little >> >> >> >>> unrealistic...or maybe it's just seems that way because I >> haven't >> >> been >> >> >> >> able >> >> >> >>> to break through 0.12 : ) >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> Sent from my iPhone >> >> >> >>>>> On Jan 3, 2017, at 7:51 PM, Bob Haffner via Omaha < >> >> omaha at python.org >> >> >> > >> >> >> >>>> wrote: >> >> >> >>>> >> >> >> >>>> Pretty interesting notebook I put together regarding the kaggle >> >> comp >> >> >> >>>> https://github.com/bobhaffner/kaggle-houseprices/blob/ >> >> >> >>> master/additional_training_data.ipynb >> >> >> >>>> >> >> >> >>>> On Mon, Jan 2, 2017 at 12:10 AM, Wes Turner via Omaha < >> >> >> >> omaha at python.org> >> >> >> >>>> wrote: >> >> >> >>>> >> >> >> >>>>>> On Wednesday, December 28, 2016, Wes Turner < >> >> wes.turner at gmail.com> >> >> >> >>> wrote: >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> On Wed, Dec 28, 2016 at 12:56 AM, Jeremy Doyle via Omaha < >> >> >> >>>>> omaha at python.org >> >> >> >>>>>> > wrote: >> >> >> >>>>>> >> >> >> >>>>>>> Woohoo! We jumped 286 positions with a meager 0.00448 >> >> improvement >> >> >> in >> >> >> >>> our >> >> >> >>>>>>> score! Currently sitting at 798th place. >> >> >> >>>>>> >> >> >> >>>>>> Nice work! Features of your feature engineering I admire: >> >> >> >>>>>> >> >> >> >>>>>> - nominal, ordinal, continuous, discrete >> >> >> >>>>>> categorical = nominal + discrete >> >> >> >>>>>> numeric = continuous + discrete >> >> >> >>>>>> >> >> >> >>>>>> - outlier removal >> >> >> >>>>>> - [ ] w/ constant thresholding? (is there a distribution >> >> parameter) >> >> >> >>>>>> >> >> >> >>>>>> - building datestrings from SaleMonth and YrSold >> >> >> >>>>>> - SaleMonth / "1" / YrSold >> >> >> >>>>>> - df..drop(['MoSold','YrSold','SaleMonth']) >> >> >> >>>>>> - [ ] why drop SaleMonth? >> >> >> >>>>>> - [ ] pandas.to_datetime[df['SaleMonth']) >> >> >> >>>>>> >> >> >> >>>>>> - merging with FHA Home Price Index for the month and region >> >> ("West >> >> >> >>> North >> >> >> >>>>>> Central") >> >> >> >>>>>> https://www.fhfa.gov/DataTools/Downloads/Documents/ >> >> >> >>>>>> HPI/HPI_PO_monthly_hist.xls >> >> >> >>>>>> - [ ] pandas.to_datetime >> >> >> >>>>>> - this should have every month, but the new merge_asof >> >> feature is >> >> >> >>>>>> worth mentioning >> >> >> >>>>>> >> >> >> >>>>>> - manual binarization >> >> >> >>>>>> - [ ] how did you pick these? correlation after >> pd.get_dummies? >> >> >> >>>>>> - [ ] why floats? 1.0 / 1 (does it make a difference?) >> >> >> >>>>>> >> >> >> >>>>>> - Ames, IA nbrhood_multiplier >> >> >> >>>>>> - http://www.cityofames.org/home/showdocument?id=1024 >> >> >> >>>>>> >> >> >> >>>>>> - feature merging >> >> >> >>>>>> - BsmtFinSF = BsmtFinSF1 + BsmtFinSF2 >> >> >> >>>>>> - TotalBaths = BsmtFullBath + (BsmtHalfBath / 2.0) + >> FullBath + >> >> >> >>>>>> (HalfBath / 2.0) >> >> >> >>>>>> - ( ) IDK how a feature-selection pipeline could do this >> >> >> >> automatically >> >> >> >>>>>> >> >> >> >>>>>> - null value imputation >> >> >> >>>>>> - .isnull() = 0 >> >> >> >>>>>> - ( ) datacleaner incorrectly sets these to median or mode >> >> >> >>>>>> >> >> >> >>>>>> - log for skewed continuous and SalePrice >> >> >> >>>>>> - ( ) auto_ml: take_log_of_y does this for SalePrice >> >> >> >>>>>> >> >> >> >>>>>> - "Keeping only the columns we want" >> >> >> >>>>>> - [ ] 'Id' shouldn't be relevant (pd.read_csv(filename, >> >> >> >>> index_col='Id') >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> - Binarization >> >> >> >>>>>> - pd.get_dummies(dummy_na=False) >> >> >> >>>>>> - [ ] (a Luke pointed out, concatenation keeps the same >> columns) >> >> >> >>>>>> rows = eng_train.shape[0] >> >> >> >>>>>> eng_merged = pd.concat(eng_train, eng_test) >> >> >> >>>>>> onehot_merged = pd.get_dummies(eng_merged, >> >> columns=nominal, >> >> >> >>>>>> dummy_na=False) >> >> >> >>>>>> onehot_train = eng_merged[:rows] >> >> >> >>>>>> onehot_test = eng_merged[rows:] >> >> >> >>>>>> >> >> >> >>>>>> - class RandomSelectionHelper >> >> >> >>>>>> - [ ] this could be generally helpful in sklean[-pandas] >> >> >> >>>>>> - https://github.com/paulgb/skle >> arn-pandas#cross-validation >> >> >> >>>>>> >> >> >> >>>>>> - Models to Search >> >> >> >>>>>> - {Ridge, Lasso, ElasticNet} >> >> >> >>>>>> >> >> >> >>>>>> - https://github.com/ClimbsRocks/auto_ml/blob/ >> >> >> >>>>>> master/auto_ml/predictor.py#L222 >> >> >> >>>>>> _get_estimator_names ( "regressor" ) >> >> >> >>>>>> - {XGBRegessor, GradientBoostingRegressor, >> RANSACRegressor, >> >> >> >>>>>> RandomForestRegressor, LinearRegression, AdaBoostRegressor, >> >> >> >>>>>> ExtraTreesRegressor} >> >> >> >>>>>> >> >> >> >>>>>> - https://github.com/ClimbsRocks/auto_ml/blob/ >> >> >> >>>>>> master/auto_ml/predictor.py#L491 >> >> >> >>>>>> - (w/ ensembling) >> >> >> >>>>>> - ['RandomForestRegressor', 'LinearRegression', >> >> >> >>>>>> 'ExtraTreesRegressor', 'Ridge', 'GradientBoostingRegressor', >> >> >> >>>>>> 'AdaBoostRegressor', 'Lasso', 'ElasticNet', 'LassoLars', >> >> >> >>>>>> 'OrthogonalMatchingPursuit', 'BayesianRidge', >> 'SGDRegressor'] + >> >> [' >> >> >> >>>>>> XGBRegressor'] >> >> >> >>>>>> >> >> >> >>>>>> - model stacking / ensembling >> >> >> >>>>>> >> >> >> >>>>>> - ( ) auto_ml: https://auto-ml.readthedocs. >> >> >> >>>>> io/en/latest/ensembling.html >> >> >> >>>>>> - ( ) auto-sklearn: >> >> >> >>>>>> https://automl.github.io/auto-sklearn/stable/api.html# >> >> >> >>>>>> autosklearn.regression.AutoSklearnRegressor >> >> >> >>>>>> ensemble_size=50, ensemble_nbest=50 >> >> >> >>>>> >> >> >> >>>>> https://en.wikipedia.org/wiki/Ensemble_learning >> >> >> >>>>> >> >> >> >>>>> http://www.scholarpedia.org/article/Ensemble_learning# >> >> >> >>>>> Ensemble_combination_rules >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>>> >> >> >> >>>>>> - submission['SalePrice'] = submission.SalePrice.apply(lam >> bda >> >> x: >> >> >> >>>>>> np.exp(x)) >> >> >> >>>>>> >> >> >> >>>>>> - [ ] What is this called / how does this work? >> >> >> >>>>>> - https://docs.scipy.org/doc/numpy/reference/generated/ >> >> >> >>>>> numpy.exp.html >> >> >> >>>>>> >> >> >> >>>>>> - df.to_csv(filename, columns=['SalePrice'], >> index_label='Id') >> >> also >> >> >> >>> works >> >> >> >>>>>> - http://pandas.pydata.org/pandas-docs/stable/generated/ >> >> >> >>>>>> pandas.DataFrame.to_csv.html >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>>> My notebook is on GitHub for those interested: >> >> >> >>>>>>> >> >> >> >>>>>>> https://github.com/jeremy-doyle/home_price_kaggle/tree/ >> >> >> >>> master/attempt_4 >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> Thanks! >> >> >> >>>>> >> >> >> >>>>> (Trimmed for 40K limit) >> >> >> >>>>> _______________________________________________ >> >> >> >>>>> Omaha Python Users Group mailing list >> >> >> >>>>> Omaha at python.org >> >> >> >>>>> https://mail.python.org/mailman/listinfo/omaha >> >> >> >>>>> http://www.OmahaPython.org >> >> >> >>>> _______________________________________________ >> >> >> >>>> Omaha Python Users Group mailing list >> >> >> >>>> Omaha at python.org >> >> >> >>>> https://mail.python.org/mailman/listinfo/omaha >> >> >> >>>> http://www.OmahaPython.org >> >> >> >>> >> >> >> >>> _______________________________________________ >> >> >> >>> Omaha Python Users Group mailing list >> >> >> >>> Omaha at python.org >> >> >> >>> https://mail.python.org/mailman/listinfo/omaha >> >> >> >>> http://www.OmahaPython.org >> >> >> >> _______________________________________________ >> >> >> >> Omaha Python Users Group mailing list >> >> >> >> Omaha at python.org >> >> >> >> https://mail.python.org/mailman/listinfo/omaha >> >> >> >> http://www.OmahaPython.org >> >> >> > _______________________________________________ >> >> >> > Omaha Python Users Group mailing list >> >> >> > Omaha at python.org >> >> >> > https://mail.python.org/mailman/listinfo/omaha >> >> >> > http://www.OmahaPython.org >> >> >> _______________________________________________ >> >> >> Omaha Python Users Group mailing list >> >> >> Omaha at python.org >> >> >> https://mail.python.org/mailman/listinfo/omaha >> >> >> http://www.OmahaPython.org >> >> >> >> >> > >> >> > >> >> _______________________________________________ >> >> Omaha Python Users Group mailing list >> >> Omaha at python.org >> >> https://mail.python.org/mailman/listinfo/omaha >> >> http://www.OmahaPython.org >> > >> > >> _______________________________________________ >> Omaha Python Users Group mailing list >> Omaha at python.org >> https://mail.python.org/mailman/listinfo/omaha >> http://www.OmahaPython.org > > From bob.haffner at gmail.com Sun Feb 12 00:16:38 2017 From: bob.haffner at gmail.com (Bob Haffner) Date: Sat, 11 Feb 2017 23:16:38 -0600 Subject: [omaha] Group Data Science Competition In-Reply-To: References: <98FDF8B2-6371-4C4A-BA84-DD18AA7DC3A0@gmail.com> <63E88FA3-5AB5-4F8D-A610-2FE27F2AB772@yahoo.com> <12F96626-6D10-4B23-87F8-8980C5069E57@gmail.com> Message-ID: Wes, I didn't check the MSE. I need to though as my submission didn't score well at all :-) I used TensorFlow as the backend. Also, I used the KerasRegressor model so that made things pretty simple On Fri, Feb 10, 2017 at 10:13 AM, Wes Turner wrote: > Anyone have a good way to re.find all the links in this thread? > > - [ ] linkgrep(thread) >> wiki > - [ ] http://www.datatau.com is like news.ycombinator.com for Data Science > > @bob > Cool. > What is the (MSE, $ deviance) with Keras? > > Keras with Theano or TensorFlow? > > On Thursday, February 9, 2017, Bob Haffner via Omaha > wrote: > >> Added a Deep Learning section to my notebook >> https://github.com/bobhaffner/kaggle-houseprices/blob/master >> /kaggle_house_prices.ipynb >> >> Using Keras for the modeling with TensorFlow as the backend. >> >> I've generated a submission, but I don't know how it performed as Kaggle >> seems to be on the fritz tonight. >> >> On Sat, Jan 14, 2017 at 12:52 AM, Wes Turner >> wrote: >> >> > >> > >> > On Friday, January 13, 2017, Bob Haffner via Omaha >> > wrote: >> > >> >> Look at that. Two teams have submitted perfect scores :-) >> >> >> >> https://www.kaggle.com/c/house-prices-advanced-regression- >> >> techniques/leaderboard >> > >> > >> > https://www.kaggle.com/c/house-prices-advanced-regression- >> techniques/rules >> > >> > - Due to the public nature of the data, this competition does not >> > count towards Kaggle ranking points. >> > - We ask that you respect the spirit of the competition and do not >> > cheat. Hand-labeling is forbidden. >> > >> > >> > https://www.kaggle.com/wiki/ModelSubmissionBestPractices >> > >> > https://www.kaggle.com/wiki/WinningModelDocumentationTemplate (CNN, >> > XGBoost) >> > >> > Hopefully I can find some time to fix the data loading function in my >> > data.py and test w/ TPOT (manual sparse arrays), auto_ml, >> > >> > - https://www.coursera.org/learn/ml-foundations/lecture/ >> > 2HrHv/learning-a-simple-regression-model-to-predict- >> > house-prices-from-house-size (UW) >> > >> > - "Python Data Science Handbook" "This repository contains entire Python >> > Data Science Handbook > >, >> >> > in the form of (free!) Jupyter notebooks." >> > https://github.com/jakevdp/PythonDataScienceHandbook/ >> > blob/master/README.md#5-machine-learning (~UW) >> > >> > I'd also like to learn how to NN w/ tensors and Keras (Theano, >> TensorFlow) >> > https://github.com/fchollet/keras >> > >> > - https://keras.io/getting-started/faq/#how-can-i-record- >> > the-training-validation-loss-accuracy-at-each-epoch >> > >> > - http://machinelearningmastery.com/regression-tutorial-keras- >> > deep-learning-library-python/ >> > >> > >> >> On Thu, Jan 5, 2017 at 11:20 AM, Bob Haffner >> >> wrote: >> >> >> >> > Hi Travis, >> >> > >> >> > >> >> > >> >> > A few of us are doing the House Prices: Advanced Regression >> Techniques >> >> > competition >> >> > >> >> > https://www.kaggle.com/c/house-prices-advanced-regression-techniques >> >> > >> >> > >> >> > >> >> > Our team is called Omaha Pythonistas. You are more than welcome to >> join >> >> > us! Just let me know which email you use to sign up with on Kaggle >> and >> >> > I?ll send out an invite. >> >> > >> >> > >> >> > >> >> > We met in December and we hope to meet again soon. Most likely >> >> following >> >> > our monthly meeting on 1/18 >> >> > >> >> > >> >> > >> >> > Some our materials >> >> > >> >> > https://github.com/omahapython/kaggle-houseprices >> >> > >> >> > >> >> > >> >> > https://github.com/jeremy-doyle/home_price_kaggle >> >> > >> >> > >> >> > >> >> > https://github.com/bobhaffner/kaggle-houseprices >> >> > >> >> > On Wed, Jan 4, 2017 at 8:50 AM, Travis Smith via Omaha < >> >> omaha at python.org> >> >> > wrote: >> >> > >> >> >> Hey, new guy here. What's the challenge, exactly? I'm not a Kaggler >> >> yet, >> >> >> but I have taken some data science courses. >> >> >> >> >> >> -Travis >> >> >> >> >> >> > On Jan 4, 2017, at 7:57, Luke Schollmeyer via Omaha < >> >> omaha at python.org> >> >> >> wrote: >> >> >> > >> >> >> > I think there's two probable things: >> >> >> > 1. We're likely using some under-powered ML methods. Most of the >> >> Kaggle >> >> >> > interviews of the top guys/teams I read are using some much more >> >> >> advanced >> >> >> > methods to get their solutions into the top spots. I think what >> we're >> >> >> doing >> >> >> > is fine for what we want to accomplish. >> >> >> > 2. Feature engineering. Again, many of the interviews show that a >> >> ton of >> >> >> > work goes in to cleaning and conforming the data. >> >> >> > >> >> >> > I haven't back tracked any of the interviews to their submissions, >> >> so I >> >> >> > don't know how often they tend to submit, like tweak a small >> aspect >> >> and >> >> >> > keep honing that until it pays off. >> >> >> > >> >> >> > On Wed, Jan 4, 2017 at 7:43 AM, Bob Haffner via Omaha < >> >> omaha at python.org >> >> >> > >> >> >> > wrote: >> >> >> > >> >> >> >> Yeah, no kidding. That pdf wasn't hard to find and that #1 >> score is >> >> >> pretty >> >> >> >> damn good >> >> >> >> >> >> >> >> On Tue, Jan 3, 2017 at 10:41 PM, Jeremy Doyle via Omaha < >> >> >> omaha at python.org> >> >> >> >> wrote: >> >> >> >> >> >> >> >>> Looks like we have our key to a score of 0.0. Lol >> >> >> >>> >> >> >> >>> Seriously though, does anyone wonder if the person sitting at #1 >> >> had >> >> >> this >> >> >> >>> full data set as well and trained a model using the entire set? >> I >> >> mean >> >> >> >> that >> >> >> >>> 0.038 score is so much better than anyone else it seems a little >> >> >> >>> unrealistic...or maybe it's just seems that way because I >> haven't >> >> been >> >> >> >> able >> >> >> >>> to break through 0.12 : ) >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> Sent from my iPhone >> >> >> >>>>> On Jan 3, 2017, at 7:51 PM, Bob Haffner via Omaha < >> >> omaha at python.org >> >> >> > >> >> >> >>>> wrote: >> >> >> >>>> >> >> >> >>>> Pretty interesting notebook I put together regarding the kaggle >> >> comp >> >> >> >>>> https://github.com/bobhaffner/kaggle-houseprices/blob/ >> >> >> >>> master/additional_training_data.ipynb >> >> >> >>>> >> >> >> >>>> On Mon, Jan 2, 2017 at 12:10 AM, Wes Turner via Omaha < >> >> >> >> omaha at python.org> >> >> >> >>>> wrote: >> >> >> >>>> >> >> >> >>>>>> On Wednesday, December 28, 2016, Wes Turner < >> >> wes.turner at gmail.com> >> >> >> >>> wrote: >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> On Wed, Dec 28, 2016 at 12:56 AM, Jeremy Doyle via Omaha < >> >> >> >>>>> omaha at python.org >> >> >> >>>>>> > wrote: >> >> >> >>>>>> >> >> >> >>>>>>> Woohoo! We jumped 286 positions with a meager 0.00448 >> >> improvement >> >> >> in >> >> >> >>> our >> >> >> >>>>>>> score! Currently sitting at 798th place. >> >> >> >>>>>> >> >> >> >>>>>> Nice work! Features of your feature engineering I admire: >> >> >> >>>>>> >> >> >> >>>>>> - nominal, ordinal, continuous, discrete >> >> >> >>>>>> categorical = nominal + discrete >> >> >> >>>>>> numeric = continuous + discrete >> >> >> >>>>>> >> >> >> >>>>>> - outlier removal >> >> >> >>>>>> - [ ] w/ constant thresholding? (is there a distribution >> >> parameter) >> >> >> >>>>>> >> >> >> >>>>>> - building datestrings from SaleMonth and YrSold >> >> >> >>>>>> - SaleMonth / "1" / YrSold >> >> >> >>>>>> - df..drop(['MoSold','YrSold','SaleMonth']) >> >> >> >>>>>> - [ ] why drop SaleMonth? >> >> >> >>>>>> - [ ] pandas.to_datetime[df['SaleMonth']) >> >> >> >>>>>> >> >> >> >>>>>> - merging with FHA Home Price Index for the month and region >> >> ("West >> >> >> >>> North >> >> >> >>>>>> Central") >> >> >> >>>>>> https://www.fhfa.gov/DataTools/Downloads/Documents/ >> >> >> >>>>>> HPI/HPI_PO_monthly_hist.xls >> >> >> >>>>>> - [ ] pandas.to_datetime >> >> >> >>>>>> - this should have every month, but the new merge_asof >> >> feature is >> >> >> >>>>>> worth mentioning >> >> >> >>>>>> >> >> >> >>>>>> - manual binarization >> >> >> >>>>>> - [ ] how did you pick these? correlation after >> pd.get_dummies? >> >> >> >>>>>> - [ ] why floats? 1.0 / 1 (does it make a difference?) >> >> >> >>>>>> >> >> >> >>>>>> - Ames, IA nbrhood_multiplier >> >> >> >>>>>> - http://www.cityofames.org/home/showdocument?id=1024 >> >> >> >>>>>> >> >> >> >>>>>> - feature merging >> >> >> >>>>>> - BsmtFinSF = BsmtFinSF1 + BsmtFinSF2 >> >> >> >>>>>> - TotalBaths = BsmtFullBath + (BsmtHalfBath / 2.0) + >> FullBath + >> >> >> >>>>>> (HalfBath / 2.0) >> >> >> >>>>>> - ( ) IDK how a feature-selection pipeline could do this >> >> >> >> automatically >> >> >> >>>>>> >> >> >> >>>>>> - null value imputation >> >> >> >>>>>> - .isnull() = 0 >> >> >> >>>>>> - ( ) datacleaner incorrectly sets these to median or mode >> >> >> >>>>>> >> >> >> >>>>>> - log for skewed continuous and SalePrice >> >> >> >>>>>> - ( ) auto_ml: take_log_of_y does this for SalePrice >> >> >> >>>>>> >> >> >> >>>>>> - "Keeping only the columns we want" >> >> >> >>>>>> - [ ] 'Id' shouldn't be relevant (pd.read_csv(filename, >> >> >> >>> index_col='Id') >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> - Binarization >> >> >> >>>>>> - pd.get_dummies(dummy_na=False) >> >> >> >>>>>> - [ ] (a Luke pointed out, concatenation keeps the same >> columns) >> >> >> >>>>>> rows = eng_train.shape[0] >> >> >> >>>>>> eng_merged = pd.concat(eng_train, eng_test) >> >> >> >>>>>> onehot_merged = pd.get_dummies(eng_merged, >> >> columns=nominal, >> >> >> >>>>>> dummy_na=False) >> >> >> >>>>>> onehot_train = eng_merged[:rows] >> >> >> >>>>>> onehot_test = eng_merged[rows:] >> >> >> >>>>>> >> >> >> >>>>>> - class RandomSelectionHelper >> >> >> >>>>>> - [ ] this could be generally helpful in sklean[-pandas] >> >> >> >>>>>> - https://github.com/paulgb/skle >> arn-pandas#cross-validation >> >> >> >>>>>> >> >> >> >>>>>> - Models to Search >> >> >> >>>>>> - {Ridge, Lasso, ElasticNet} >> >> >> >>>>>> >> >> >> >>>>>> - https://github.com/ClimbsRocks/auto_ml/blob/ >> >> >> >>>>>> master/auto_ml/predictor.py#L222 >> >> >> >>>>>> _get_estimator_names ( "regressor" ) >> >> >> >>>>>> - {XGBRegessor, GradientBoostingRegressor, >> RANSACRegressor, >> >> >> >>>>>> RandomForestRegressor, LinearRegression, AdaBoostRegressor, >> >> >> >>>>>> ExtraTreesRegressor} >> >> >> >>>>>> >> >> >> >>>>>> - https://github.com/ClimbsRocks/auto_ml/blob/ >> >> >> >>>>>> master/auto_ml/predictor.py#L491 >> >> >> >>>>>> - (w/ ensembling) >> >> >> >>>>>> - ['RandomForestRegressor', 'LinearRegression', >> >> >> >>>>>> 'ExtraTreesRegressor', 'Ridge', 'GradientBoostingRegressor', >> >> >> >>>>>> 'AdaBoostRegressor', 'Lasso', 'ElasticNet', 'LassoLars', >> >> >> >>>>>> 'OrthogonalMatchingPursuit', 'BayesianRidge', >> 'SGDRegressor'] + >> >> [' >> >> >> >>>>>> XGBRegressor'] >> >> >> >>>>>> >> >> >> >>>>>> - model stacking / ensembling >> >> >> >>>>>> >> >> >> >>>>>> - ( ) auto_ml: https://auto-ml.readthedocs. >> >> >> >>>>> io/en/latest/ensembling.html >> >> >> >>>>>> - ( ) auto-sklearn: >> >> >> >>>>>> https://automl.github.io/auto-sklearn/stable/api.html# >> >> >> >>>>>> autosklearn.regression.AutoSklearnRegressor >> >> >> >>>>>> ensemble_size=50, ensemble_nbest=50 >> >> >> >>>>> >> >> >> >>>>> https://en.wikipedia.org/wiki/Ensemble_learning >> >> >> >>>>> >> >> >> >>>>> http://www.scholarpedia.org/article/Ensemble_learning# >> >> >> >>>>> Ensemble_combination_rules >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>>> >> >> >> >>>>>> - submission['SalePrice'] = submission.SalePrice.apply(lam >> bda >> >> x: >> >> >> >>>>>> np.exp(x)) >> >> >> >>>>>> >> >> >> >>>>>> - [ ] What is this called / how does this work? >> >> >> >>>>>> - https://docs.scipy.org/doc/numpy/reference/generated/ >> >> >> >>>>> numpy.exp.html >> >> >> >>>>>> >> >> >> >>>>>> - df.to_csv(filename, columns=['SalePrice'], >> index_label='Id') >> >> also >> >> >> >>> works >> >> >> >>>>>> - http://pandas.pydata.org/pandas-docs/stable/generated/ >> >> >> >>>>>> pandas.DataFrame.to_csv.html >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>>> My notebook is on GitHub for those interested: >> >> >> >>>>>>> >> >> >> >>>>>>> https://github.com/jeremy-doyle/home_price_kaggle/tree/ >> >> >> >>> master/attempt_4 >> >> >> >>>>>> >> >> >> >>>>>> >> >> >> >>>>>> Thanks! >> >> >> >>>>> >> >> >> >>>>> (Trimmed for 40K limit) >> >> >> >>>>> _______________________________________________ >> >> >> >>>>> Omaha Python Users Group mailing list >> >> >> >>>>> Omaha at python.org >> >> >> >>>>> https://mail.python.org/mailman/listinfo/omaha >> >> >> >>>>> http://www.OmahaPython.org >> >> >> >>>> _______________________________________________ >> >> >> >>>> Omaha Python Users Group mailing list >> >> >> >>>> Omaha at python.org >> >> >> >>>> https://mail.python.org/mailman/listinfo/omaha >> >> >> >>>> http://www.OmahaPython.org >> >> >> >>> >> >> >> >>> _______________________________________________ >> >> >> >>> Omaha Python Users Group mailing list >> >> >> >>> Omaha at python.org >> >> >> >>> https://mail.python.org/mailman/listinfo/omaha >> >> >> >>> http://www.OmahaPython.org >> >> >> >> _______________________________________________ >> >> >> >> Omaha Python Users Group mailing list >> >> >> >> Omaha at python.org >> >> >> >> https://mail.python.org/mailman/listinfo/omaha >> >> >> >> http://www.OmahaPython.org >> >> >> > _______________________________________________ >> >> >> > Omaha Python Users Group mailing list >> >> >> > Omaha at python.org >> >> >> > https://mail.python.org/mailman/listinfo/omaha >> >> >> > http://www.OmahaPython.org >> >> >> _______________________________________________ >> >> >> Omaha Python Users Group mailing list >> >> >> Omaha at python.org >> >> >> https://mail.python.org/mailman/listinfo/omaha >> >> >> http://www.OmahaPython.org >> >> >> >> >> > >> >> > >> >> _______________________________________________ >> >> Omaha Python Users Group mailing list >> >> Omaha at python.org >> >> https://mail.python.org/mailman/listinfo/omaha >> >> http://www.OmahaPython.org >> > >> > >> _______________________________________________ >> Omaha Python Users Group mailing list >> Omaha at python.org >> https://mail.python.org/mailman/listinfo/omaha >> http://www.OmahaPython.org >> > From wes.turner at gmail.com Sun Feb 12 19:07:18 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 12 Feb 2017 18:07:18 -0600 Subject: [omaha] Group Data Science Competition In-Reply-To: References: <98FDF8B2-6371-4C4A-BA84-DD18AA7DC3A0@gmail.com> <63E88FA3-5AB5-4F8D-A610-2FE27F2AB772@yahoo.com> <12F96626-6D10-4B23-87F8-8980C5069E57@gmail.com> Message-ID: On Saturday, February 11, 2017, Bob Haffner via Omaha > wrote: > Wes, I didn't check the MSE. I need to though as my submission didn't > score well at all :-) > > I used TensorFlow as the backend. Also, I used the KerasRegressor model > so that made things pretty simple There are so many neural network / deep learning topologies: http://www.asimovinstitute.org/neural-network-zoo/ "A mostly complete chart of neural networks" http://www.asimovinstitute.org/wp-content/uploads/2016/ 09/neuralnetworks.png ... How does changing the nb_epoch parameter affect the .score()? http://stackoverflow.com/questions/36936209/how-to-make-keras-neural-net-outperforming-logistic-regression-on-iris-data How does adding one or more layers affect the objective statistic? Does creating a datetime64 (or UNIX time) from the year and month improve the output? Is this learnable as separate fields? http://blog.fastforwardlabs.com/2016/02/24/hello-world-in-keras-or-scikit-learn-versus.html How does the extra data prep and merged home price index data by Dave Graham affect this NN model? https://en.wikipedia.org/wiki/Data_wrangling#See_also @davegraham TODO Could you wrap up your feature logic as an sklearn .transform()-able class? It's possible to import *named functions* and/or data from .ipynb notebooks with pypi:ipynb. If we can agree on a functional interface (sklearn transform[/fit/predict/]), we could mix and match sklearn pipeline functions in our omahapython/kaggle-houseprices repo !: https://github.com/omahapython/kaggle-houseprices/tree/master/src For lack of a better namespace scheme for module import, I added my repo as a git submodule prefixed with my username: src/westurner_house_prices -> gh:westurner/kaggle-houseprices ~master ( Btw, - I just summarized the answers to "Using IPython notebooks under version control" http://stackoverflow.com/questions/18734739/using-ipython-notebooks-under-version-control/42128373#42128373 - Nbdime contains {nbdiff, nbmerge} - auto-save .Ipynb as .py pre/post_save() - `nbconvert --to python` on_commit() - auto-create a reate a .clean.ipynb (or .strippedoutput.ipynb) with eg nbstripout ... - I like runipy because it re-numbers all the cell's and prints to stdout/stderr ) In order to crossover, mutate, and improve our score, I think we should compose a few scikit-learn pipelines which combine our group efforts? - [ ] git submodules in src/ - [ ] importable *named functions* w/o side effects - [ ] DOC: :returns: pd.DataFrame, np., sklearn API - [ ] a standard local model scoring function (because what does the kaggle score even mean) - real dollars deviance: sum(abs(residuals)) sklearn API http://scikit-learn.org/stable/modules/pipeline.html : """ Pipeline can be used to chain multiple estimators into one. This is useful as there is often a fixed sequence of steps in processing the data, for example feature selection, normalization and classification. Pipeline serves two purposes here: *Convenience*: You only have to call fit and predict once on your data to fit a whole sequence of estimators. *Joint parameter selection*: You can grid search over parameters of all estimators in the pipeline at once. All estimators in a pipeline, except the last one, must be transformers (i.e. must have a transformmethod). The last estimator may be any type (transformer, classifier, etc.). """ https://keras.io/scikit-learn-api/ - (*grid search for hyper parameters) - KerasRegressor works as a sklearn pipeline step > On Fri, Feb 10, 2017 at 10:13 AM, Wes Turner wrote: > > > Anyone have a good way to re.find all the links in this thread? > > > > - [ ] linkgrep(thread) >> wiki > > - [ ] http://www.datatau.com is like news.ycombinator.com for Data > Science > > > > @bob > > Cool. > > What is the (MSE, $ deviance) with Keras? > > > > Keras with Theano or TensorFlow? > > > > On Thursday, February 9, 2017, Bob Haffner via Omaha > > wrote: > > > >> Added a Deep Learning section to my notebook > >> https://github.com/bobhaffner/kaggle-houseprices/blob/master > >> /kaggle_house_prices.ipynb > >> > >> Using Keras for the modeling with TensorFlow as the backend. > >> > >> I've generated a submission, but I don't know how it performed as Kaggle > >> seems to be on the fritz tonight. > >> > >> On Sat, Jan 14, 2017 at 12:52 AM, Wes Turner > >> wrote: > >> > >> > > >> > > > (... Trimmed ) From wereapwhatwesow at gmail.com Mon Feb 13 15:00:54 2017 From: wereapwhatwesow at gmail.com (Steve Young) Date: Mon, 13 Feb 2017 14:00:54 -0600 Subject: [omaha] Winter/Spring 2017 meeting topics and speakers In-Reply-To: References: Message-ID: Thanks Bob. Let's tentatively plan for that, and we finalize at the meeting in a few days. On Sun, Jan 29, 2017 at 7:14 PM, Bob Haffner wrote: > Steve, I'm still game to do the Microservices with Flask talk. I can take > the March slot if no one else does. > > All, please consider giving a talk or perhaps leading a discussion. > Doesn't have to be lengthy or elaborate. We're an easygoing bunch :-) > > Bob > > > > > > On Mon, Jan 23, 2017 at 5:07 PM, Steve Young via Omaha > wrote: > >> We had a good meeting last week - thanks Becky and the others who >> attended. I am amazed at how many Python libraries are available that >> simplify complex programming tasks. >> >> February is scheduled - Hubert Hickman, Victor Winter, Betty Love, >> presenting on Building the Bricklayer ID >> > -bricklayer-ide?instance_id=35>E, >> at the DoSpace, Meeting Room 1. >> >> I would love to start getting some topics and presenters scheduled for the >> next few months. >> >> Now is your chance to have your time in the spotlight or request a topic >> for someone else to present. Bob H has offered to present on Flask, but we >> have not picked a date for it yet. >> >> March >> April >> May >> June >> >> Just reply to the thread with your ideas. Thanks. >> >> Steve >> _______________________________________________ >> Omaha Python Users Group mailing list >> Omaha at python.org >> https://mail.python.org/mailman/listinfo/omaha >> http://www.OmahaPython.org >> > > From wes.turner at gmail.com Tue Feb 14 00:10:44 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 13 Feb 2017 23:10:44 -0600 Subject: [omaha] Group Data Science Competition In-Reply-To: References: <98FDF8B2-6371-4C4A-BA84-DD18AA7DC3A0@gmail.com> <63E88FA3-5AB5-4F8D-A610-2FE27F2AB772@yahoo.com> <12F96626-6D10-4B23-87F8-8980C5069E57@gmail.com> Message-ID: On Sun, Feb 12, 2017 at 6:07 PM, Wes Turner wrote: > > On Saturday, February 11, 2017, Bob Haffner via Omaha > wrote: > >> Wes, I didn't check the MSE. I need to though as my submission didn't >> score well at all :-) >> >> I used TensorFlow as the backend. Also, I used the KerasRegressor model >> so that made things pretty simple > > > There are so many neural network / deep learning topologies: > http://www.asimovinstitute.org/neural-network-zoo/ > > "A mostly complete chart of neural networks" > http://www.asimovinstitute.org/wp-content/uploads/2016/09/ > neuralnetworks.png > > ... > > How does changing the nb_epoch parameter affect the .score()? > http://stackoverflow.com/questions/36936209/how-to-make-keras-neural-net- > outperforming-logistic-regression-on-iris-data > > How does adding one or more layers affect the objective statistic? > > Does creating a datetime64 (or UNIX time) from the year and month improve > the output? Is this learnable as separate fields? > > http://blog.fastforwardlabs.com/2016/02/24/hello-world-in- > keras-or-scikit-learn-versus.html > > How does the extra data prep and merged home price index data by Jeremy > Doyle affect this NN model? > https://en.wikipedia.org/wiki/Data_wrangling#See_also > > * > @jeremydoyle > Could you wrap up your feature logic as an sklearn .transform()-able class? > http://scikit-learn.org/stable/data_transforms.html https://stackoverflow.com/questions/25539311/custom-transformer-for-sklearn-pipeline-that-alters-both-x-and-y - http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html - https://github.com/scikit-learn/scikit-learn/issues/3855 "Resampler estimators that change the sample size in fitting" - according to this, in order to integrate features of our team's various models with scikit-learn, we should have a separate preprocessing pipeline (with fit, transform, fit_transform); followed by the fit() and predict() pipeline if we modify y. to avoid dropping NaN/NULLs, we instead impute using various strategies: - https://github.com/rhiever/datacleaner/issues/1#issuecomment-279607720 re: imputation w/ scikit-learn * https://en.wikipedia.org/wiki/Imputation_(statistics) : > In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". **There are three main problems that missing data causes: *missing data can introduce a substantial amount of bias*, make the handling and analysis of the data more arduous, and create reductions in efficiency.**[1] Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with an estimated value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data.[2] Imputation theory is constantly developing and thus requires consistent attention to new information regarding the subject. There have been many theories embraced by scientists to account for missing data but the majority of them introduce large amounts of bias. A few of the well known attempts to deal with missing data include: **hot deck and cold deck imputation; listwise and pairwise deletion; mean imputation; regression imputation; last observation carried forward; stochastic imputation; and multiple imputation.** [emphasis added] * http://scikit-learn.org/stable/modules/preprocessing.html#imputation-of-missing-values * http://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing * http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Imputer.html * class sklearn.preprocessing.Imputer(, strategy=('mean', 'median', 'most_frequent'), ) * http://scikit-learn.org/stable/auto_examples/missing_values.html : > **Imputing missing values before building an estimator**? > This example shows that imputing the missing values can give better results than discarding the samples containing any missing value. Imputing does not always improve the predictions, so please check via cross-validation. Sometimes dropping rows or using marker values is more effective. Missing values can be replaced by the mean, the median or the most frequent value using the strategy hyper-parameter. The median is a more robust estimator for data with high magnitude variables which could dominate results (otherwise known as a ?long tail?)." > > It's possible to import *named functions* and/or data from .ipynb > notebooks with pypi:ipynb. > ipynb - Package / Module importer for importing code from Jupyter Notebook files (.ipynb) | Src: https://github.com/ipython/ipynb | Docs: https://ipynb.readthedocs.io/en/latest/ | PyPI: https://pypi.python.org/pypi/ipynb > > If we can agree on a functional interface (sklearn > transform[/fit/predict/]), we could mix and match sklearn pipeline > functions in our omahapython/kaggle-houseprices repo !: > https://github.com/omahapython/kaggle-houseprices/tree/master/src > > For lack of a better namespace scheme for module import, > I added my repo as a git submodule prefixed with my username: > src/westurner_house_prices -> gh:westurner/kaggle-houseprices ~master > > > ( > Btw, > - I just summarized the answers to "Using IPython notebooks under version > control" http://stackoverflow.com/questions/18734739/using- > ipython-notebooks-under-version-control/42128373#42128373 > - Nbdime contains {nbdiff, nbmerge} > - auto-save .Ipynb as .py pre/post_save() > - `nbconvert --to python` on_commit() > - auto-create a reate a .clean.ipynb (or .strippedoutput.ipynb) with eg > nbstripout > ... > - I like runipy because it re-numbers all the cell's and prints to > stdout/stderr > ) > > In order to crossover, mutate, and improve our score, I think we should > compose a few scikit-learn pipelines which combine our group efforts? > > > - [ ] git submodules in src/ > https://github.com/blog/2104-working-with-submodules > - [ ] importable *named functions* w/o side effects > see FunctionTransformer (link above) > - [ ] DOC: :returns: pd.DataFrame, np., sklearn API > - [ ] a standard local model scoring function (because what does the > kaggle score even mean) > - real dollars deviance: sum(abs(residuals)) > > > - MSE sum(residuals**2) > sklearn API > http://scikit-learn.org/stable/modules/pipeline.html : > > """ > > Pipeline > can > be used to chain multiple estimators into one. This is useful as there is > often a fixed sequence of steps in processing the data, for example feature > selection, normalization and classification. Pipeline > > serves two purposes here: > > *Convenience*: You only have to call fit and predict once on your data to > fit a whole sequence of estimators. > > *Joint parameter selection*: You can grid search > over > parameters of all estimators in the pipeline at once. > > All estimators in a pipeline, except the last one, must be transformers > (i.e. must have a transformmethod). The last estimator may be any type > (transformer, classifier, etc.). > """ > sklearn.base.TransformerMixin http://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html#sklearn.base.TransformerMixin sklearn.preprocessing.FunctionTransformer.html http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html > > > https://keras.io/scikit-learn-api/ > - (*grid search for hyper parameters) > - KerasRegressor works as a sklearn pipeline step > > >> On Fri, Feb 10, 2017 at 10:13 AM, Wes Turner >> wrote: >> >> > Anyone have a good way to re.find all the links in this thread? >> > >> > - [ ] linkgrep(thread) >> wiki >> > - [ ] http://www.datatau.com is like news.ycombinator.com for Data >> Science >> > >> > @bob >> > Cool. >> > What is the (MSE, $ deviance) with Keras? >> > >> > Keras with Theano or TensorFlow? >> > >> > On Thursday, February 9, 2017, Bob Haffner via Omaha >> > wrote: >> > >> >> Added a Deep Learning section to my notebook >> >> https://github.com/bobhaffner/kaggle-houseprices/blob/master >> >> /kaggle_house_prices.ipynb >> >> >> >> Using Keras for the modeling with TensorFlow as the backend. >> >> >> >> I've generated a submission, but I don't know how it performed as >> Kaggle >> >> seems to be on the fritz tonight. >> >> >> >> On Sat, Jan 14, 2017 at 12:52 AM, Wes Turner >> >> wrote: >> >> >> >> > >> >> > >> > (... Trimmed ) > > From wereapwhatwesow at gmail.com Wed Feb 15 09:41:05 2017 From: wereapwhatwesow at gmail.com (Steve Young) Date: Wed, 15 Feb 2017 08:41:05 -0600 Subject: [omaha] =?utf-8?q?Tonight=3A_February_Meeting_=E2=80=93_Building_?= =?utf-8?q?the_Bricklayer_IDE?= Message-ID: Hope you can attend. http://www.omahapython.org/blog/archives/event/building-the-bricklayer-ide?instance_id=35 Steve From hubert.hickman at gmail.com Wed Feb 15 16:30:06 2017 From: hubert.hickman at gmail.com (Hubert Hickman) Date: Wed, 15 Feb 2017 15:30:06 -0600 Subject: [omaha] =?utf-8?q?Tonight=3A_February_Meeting_=E2=80=93_Building?= =?utf-8?q?_the_Bricklayer_IDE?= In-Reply-To: References: Message-ID: I'll be there ;-) However, I may be just a few minutes late - but Victor will do some introductions to bricklayer.... Hubert On Wed, Feb 15, 2017 at 8:41 AM, Steve Young via Omaha wrote: > Hope you can attend. > > http://www.omahapython.org/blog/archives/event/building- > the-bricklayer-ide?instance_id=35 > > Steve > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org > From bob.haffner at gmail.com Wed Feb 15 18:52:02 2017 From: bob.haffner at gmail.com (Bob Haffner) Date: Wed, 15 Feb 2017 17:52:02 -0600 Subject: [omaha] =?utf-8?q?Tonight=3A_February_Meeting_=E2=80=93_Building?= =?utf-8?q?_the_Bricklayer_IDE?= In-Reply-To: References: Message-ID: I'll be there! Which room, Steve? On Wed, Feb 15, 2017 at 8:41 AM, Steve Young via Omaha wrote: > Hope you can attend. > > http://www.omahapython.org/blog/archives/event/building- > the-bricklayer-ide?instance_id=35 > > Steve > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org > From wereapwhatwesow at gmail.com Wed Feb 15 19:12:41 2017 From: wereapwhatwesow at gmail.com (Steve Young) Date: Wed, 15 Feb 2017 18:12:41 -0600 Subject: [omaha] =?utf-8?q?Tonight=3A_February_Meeting_=E2=80=93_Building?= =?utf-8?q?_the_Bricklayer_IDE?= In-Reply-To: References: Message-ID: Meeting Room 1 On Wed, Feb 15, 2017 at 5:52 PM, Bob Haffner via Omaha wrote: > I'll be there! Which room, Steve? > > On Wed, Feb 15, 2017 at 8:41 AM, Steve Young via Omaha > wrote: > > > Hope you can attend. > > > > http://www.omahapython.org/blog/archives/event/building- > > the-bricklayer-ide?instance_id=35 > > > > Steve > > _______________________________________________ > > Omaha Python Users Group mailing list > > Omaha at python.org > > https://mail.python.org/mailman/listinfo/omaha > > http://www.OmahaPython.org > > > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org > From bob.haffner at gmail.com Thu Feb 16 09:40:17 2017 From: bob.haffner at gmail.com (Bob Haffner) Date: Thu, 16 Feb 2017 08:40:17 -0600 Subject: [omaha] =?utf-8?q?Tonight=3A_February_Meeting_=E2=80=93_Building?= =?utf-8?q?_the_Bricklayer_IDE?= In-Reply-To: References: Message-ID: Thanks to Victor and Hubert for presenting on Bricklayer(bricklayer.org) last night, I wish I didn't need to leave so early. On Wed, Feb 15, 2017 at 6:12 PM, Steve Young via Omaha wrote: > Meeting Room 1 > > > > On Wed, Feb 15, 2017 at 5:52 PM, Bob Haffner via Omaha > wrote: > > > I'll be there! Which room, Steve? > > > > On Wed, Feb 15, 2017 at 8:41 AM, Steve Young via Omaha > > > wrote: > > > > > Hope you can attend. > > > > > > http://www.omahapython.org/blog/archives/event/building- > > > the-bricklayer-ide?instance_id=35 > > > > > > Steve > > > _______________________________________________ > > > Omaha Python Users Group mailing list > > > Omaha at python.org > > > https://mail.python.org/mailman/listinfo/omaha > > > http://www.OmahaPython.org > > > > > _______________________________________________ > > Omaha Python Users Group mailing list > > Omaha at python.org > > https://mail.python.org/mailman/listinfo/omaha > > http://www.OmahaPython.org > > > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org > From bob.haffner at gmail.com Thu Feb 16 09:54:29 2017 From: bob.haffner at gmail.com (Bob Haffner) Date: Thu, 16 Feb 2017 08:54:29 -0600 Subject: [omaha] Winter/Spring 2017 meeting topics and speakers In-Reply-To: References: Message-ID: Would any of the Kagglers be interested in presenting for the March or April meeting? I'm thinking multiple presenters each doing 10 to 20 minutes of their work/findings. On Mon, Feb 13, 2017 at 2:00 PM, Steve Young wrote: > Thanks Bob. Let's tentatively plan for that, and we finalize at the > meeting in a few days. > > On Sun, Jan 29, 2017 at 7:14 PM, Bob Haffner > wrote: > >> Steve, I'm still game to do the Microservices with Flask talk. I can >> take the March slot if no one else does. >> >> All, please consider giving a talk or perhaps leading a discussion. >> Doesn't have to be lengthy or elaborate. We're an easygoing bunch :-) >> >> Bob >> >> >> >> >> >> On Mon, Jan 23, 2017 at 5:07 PM, Steve Young via Omaha >> wrote: >> >>> We had a good meeting last week - thanks Becky and the others who >>> attended. I am amazed at how many Python libraries are available that >>> simplify complex programming tasks. >>> >>> February is scheduled - Hubert Hickman, Victor Winter, Betty Love, >>> presenting on Building the Bricklayer ID >>> >> -bricklayer-ide?instance_id=35>E, >>> at the DoSpace, Meeting Room 1. >>> >>> I would love to start getting some topics and presenters scheduled for >>> the >>> next few months. >>> >>> Now is your chance to have your time in the spotlight or request a topic >>> for someone else to present. Bob H has offered to present on Flask, but >>> we >>> have not picked a date for it yet. >>> >>> March >>> April >>> May >>> June >>> >>> Just reply to the thread with your ideas. Thanks. >>> >>> Steve >>> _______________________________________________ >>> Omaha Python Users Group mailing list >>> Omaha at python.org >>> https://mail.python.org/mailman/listinfo/omaha >>> http://www.OmahaPython.org >>> >> >> > From wereapwhatwesow at gmail.com Mon Feb 20 15:56:02 2017 From: wereapwhatwesow at gmail.com (Steve Young) Date: Mon, 20 Feb 2017 14:56:02 -0600 Subject: [omaha] Fwd: OPUG New Question Notification In-Reply-To: <7b347ff3f380dcac18f096f863af4307@omahapython.org> References: <7b347ff3f380dcac18f096f863af4307@omahapython.org> Message-ID: We have had another question on the website if anyone wants to try an answer. We can also use this to post questions/answers/topic details on the website to keep them more easily available than the email list for historical/posterity sake. ---------- Forwarded message ---------- From: WordPress Date: Sun, Feb 19, 2017 at 4:52 PM Subject: OPUG New Question Notification To: wereapwhatwesow at gmail.com, dundeemt+omaha at gmail.com A new question was posted on Omaha Python Users Group, has asked a question "for loops - beginner question ": Greetings! I have recently started Python as my first computer language and there is one part of the for loops I am having trouble with. The objective of the code below is to turn a list [1,1,2,3,3,3,4,4,5] into a list with only the unique elements of the first list. [1,2,3,4,5] without the use of sets. *code:* def unique_list(g): x = [ ] * for* a* in* g: * if* a *not in* x: x.append(a) return x I understand what each individual part of the code does but the part of "*for* a *in* g" is what confuses me. I know "g" is the list in reference but what is "a"? If it is the elements in "g" how can Python tell the difference of each element, if I had numbers and strings and symbols? Also, the below section is confusing in the sense of what the role of "a" is and how does it know which to append? I get that if an "a" similar to another "a" is in the list it will not duplicate it but how does it know which "a" to append? *code:* *if* a *not in* x: x.append(a) My apologies for the long post and I hope what I asked is understandable. Any help would be greatly appreciated! All the best, Nico View Question ? Omaha Python Users Group - Python Users in the Omaha Metro Area From choman at gmail.com Mon Feb 20 16:23:09 2017 From: choman at gmail.com (Chad Homan) Date: Mon, 20 Feb 2017 15:23:09 -0600 Subject: [omaha] Fwd: OPUG New Question Notification In-Reply-To: References: <7b347ff3f380dcac18f096f863af4307@omahapython.org> Message-ID: so a is essentially a tempory variable that is update while walking the list for a in g: print a would output 1 1 2 3 3 etc so keeping that in mind, if if conditional check to see if what "a" currently is assigned exists in the "x" list. and as written appends to the list if it is not seen in "x" Hope that helps Together We Win! Looking for cloud storage, try pCloud (10g free ) -- Chad Some people, when confronted with a problem, think "I know, I'll use Windows." Now they have two problems. Some people claim if you play a Windows Install Disc backwards you'll hear satanic Messages. That's nothing, if you play it forward it installs Windows On Mon, Feb 20, 2017 at 2:56 PM, Steve Young via Omaha wrote: > We have had another question on the website if anyone wants to try an > answer. > > We can also use this to post questions/answers/topic details on the website > to keep them more easily available than the email list for > historical/posterity sake. > > ---------- Forwarded message ---------- > From: WordPress > Date: Sun, Feb 19, 2017 at 4:52 PM > Subject: OPUG New Question Notification > To: wereapwhatwesow at gmail.com, dundeemt+omaha at gmail.com > > > A new question was posted on Omaha Python Users Group, > > has asked a question "for loops - beginner question > ": > Greetings! I have recently started Python as my first computer language and > there is one part of the for loops I am having trouble with. The objective > of the code below is to turn a list [1,1,2,3,3,3,4,4,5] into a list with > only the unique elements of the first list. [1,2,3,4,5] without the use of > sets. *code:* def unique_list(g): > x = [ ] > * for* a* in* g: > * if* a *not in* x: > x.append(a) > return x I understand what each individual part of the code does but the > part of "*for* a *in* g" is what confuses me. I know "g" is the list in > reference but what is "a"? If it is the elements in "g" how can Python tell > the difference of each element, if I had numbers and strings and symbols? > Also, the below section is confusing in the sense of what the role of "a" > is and how does it know which to append? I get that if an "a" similar to > another "a" is in the list it will not duplicate it but how does it know > which "a" to append? *code:* *if* a *not in* x: > x.append(a) My apologies for the long post and I hope what > I asked is understandable. Any help would be greatly appreciated! All the > best, Nico > View Question ? > > Omaha Python Users Group - Python Users in the Omaha Metro Area > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org From wes.turner at gmail.com Mon Feb 20 17:25:52 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 20 Feb 2017 16:25:52 -0600 Subject: [omaha] Fwd: OPUG New Question Notification In-Reply-To: References: <7b347ff3f380dcac18f096f863af4307@omahapython.org> Message-ID: - collections.OrderedDict.fromkeys(g).values() - dict.fromkeys(g).values() https://wiki.python.org/moin/TimeComplexity ... ? O(1) is much preferable to O(n) for ?x in list()? On Monday, February 20, 2017, Steve Young via Omaha wrote: > We have had another question on the website if anyone wants to try an > answer. > > We can also use this to post questions/answers/topic details on the website > to keep them more easily available than the email list for > historical/posterity sake. > > ---------- Forwarded message ---------- > From: WordPress > > Date: Sun, Feb 19, 2017 at 4:52 PM > Subject: OPUG New Question Notification > To: wereapwhatwesow at gmail.com , dundeemt+omaha at gmail.com > > > > A new question was posted on Omaha Python Users Group, > > has asked a question "for loops - beginner question > ": > Greetings! I have recently started Python as my first computer language and > there is one part of the for loops I am having trouble with. The objective > of the code below is to turn a list [1,1,2,3,3,3,4,4,5] into a list with > only the unique elements of the first list. [1,2,3,4,5] without the use of > sets. *code:* def unique_list(g): > x = [ ] > * for* a* in* g: > * if* a *not in* x: > x.append(a) > return x I understand what each individual part of the code does but the > part of "*for* a *in* g" is what confuses me. I know "g" is the list in > reference but what is "a"? If it is the elements in "g" how can Python tell > the difference of each element, if I had numbers and strings and symbols? > Also, the below section is confusing in the sense of what the role of "a" > is and how does it know which to append? I get that if an "a" similar to > another "a" is in the list it will not duplicate it but how does it know > which "a" to append? *code:* *if* a *not in* x: > x.append(a) My apologies for the long post and I hope what > I asked is understandable. Any help would be greatly appreciated! All the > best, Nico > View Question ? > > Omaha Python Users Group - Python Users in the Omaha Metro Area > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org From wereapwhatwesow at gmail.com Mon Feb 27 17:05:42 2017 From: wereapwhatwesow at gmail.com (Steve Young) Date: Mon, 27 Feb 2017 16:05:42 -0600 Subject: [omaha] Fwd: OPUG New Question Notification In-Reply-To: <502429fc1b5ada4ce0ab52d555afaf12@omahapython.org> References: <502429fc1b5ada4ce0ab52d555afaf12@omahapython.org> Message-ID: I did not realize an online Q&A would be so popular. Another question has been posted if you would like to view or respond. Thanks. Steve ---------- Forwarded message ---------- A new question was posted on Omaha Python Users Group, has asked a question "programming a new 2-dimensional board game ": Hi, I am trying to turn a board game I invented into a 2D computer game. Is python the right language for this? View Question ? Omaha Python Users Group - Python Users in the Omaha Metro Area From wereapwhatwesow at gmail.com Mon Feb 27 17:54:28 2017 From: wereapwhatwesow at gmail.com (Steve Young) Date: Mon, 27 Feb 2017 16:54:28 -0600 Subject: [omaha] March Meeting Update Message-ID: Bob Hafner is speaking in Micro-services and Flask. We should have some time for the Kaggle group to give an update - last I saw we were just out of the top 1000. Wednesday, March 15 Do-Space meeting room 1. 6:30-8pm (the room was booked until 6:30, so we won't be able to access it until close to that time) http://www.omahapython.org/blog/archives/event/march-meeting-micro-services-with-flask?instance_id=37 Steve From hubert.hickman at gmail.com Tue Feb 28 10:26:40 2017 From: hubert.hickman at gmail.com (Hubert Hickman) Date: Tue, 28 Feb 2017 09:26:40 -0600 Subject: [omaha] March Meeting Update In-Reply-To: References: Message-ID: I've put the meeting on the TechOmaha calendar. Hubert On Mon, Feb 27, 2017 at 4:54 PM, Steve Young via Omaha wrote: > Bob Hafner is speaking in Micro-services and Flask. > > We should have some time for the Kaggle group to give an update - last I > saw we were just out of the top 1000. > > Wednesday, March 15 > Do-Space meeting room 1. > 6:30-8pm (the room was booked until 6:30, so we won't be able to access it > until close to that time) > > http://www.omahapython.org/blog/archives/event/march- > meeting-micro-services-with-flask?instance_id=37 > > Steve > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org > From wes.turner at gmail.com Tue Feb 28 13:52:16 2017 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 28 Feb 2017 12:52:16 -0600 Subject: [omaha] March Meeting Update In-Reply-To: References: Message-ID: flask python web framework resources: - https://westurner.org/wiki/awesome-python-testing#flask : Flask? > | Wikipedia: https://en.wikipedia.org/wiki/Flask_(web_framework) > | Homepage: https://flask.pocoo.org/ > | Src: https://github.com/mitsuhiko/flask > | PyPI: https://pypi.python.org/pypi/Flask > | Docs: https://flask.readthedocs.org/en/latest/config/ > | Docs: https://flask.pocoo.org/docs/latest/ > | Docs: https://flask.pocoo.org/docs/latest/testing/ > | Awesome: https://github.com/humiaozuzu/awesome-flask > > - https://flask-debugtoolbar.readthedocs.org/en/latest/ - http://flask.pocoo.org/docs/latest/changelog/ - https://github.com/humiaozuzu/awesome-flask : [...] > App template/bootstrap/boilerplate > > - fbone > > > - flask-base > > > - cookiecutter-flask > > > - cookiecutter-flask-pythonic > > > > - Flask-Foundation > > > - Flask-Empty > > > - flask-rest-template > > > > - gae-init - Flask boilerplate running > on Google App Engine > > > - GAE Starter Kit - Flask, > Flask-Login, WTForms, UIKit, and more, running on Google App Engine > > [...] User models and auth (OAuth, Service XYZ) views: https://github.com/sahat/satellizer https://github.com/sahat/satellizer/blob/master/examples/server/python/app.py https://github.com/PhilipGarnero/django-rest-framework-social-oauth2 Sample projects: - https://github.com/IntuitiveWebSolutions/EngineeringMidLevel - https://github.com/westurner/flasktestapp - https://github.com/westurner/flasktestapp/commits/develop - https://github.com/sloria/cookiecutter-flask : Features > - Bootstrap 3 and Font Awesome 4 with starter templates > - Flask-SQLAlchemy with basic User model > - Easy database migrations with Flask-Migrate > - Flask-WTForms with login and registration forms > - Flask-Login for authentication > - Flask-Bcrypt for password hashing > - Procfile for deploying to a PaaS (e.g. Heroku) > - pytest and Factory-Boy for testing (example tests included) > - Flask's Click CLI configured with simple commands > - CSS and JS minification using Flask-Assets > - Optional bower support for frontend package management > - Caching using Flask-Cache > - Useful debug toolbar > - Utilizes best practices: Blueprints and Application Factory patterns https://www.google.com/search?q=flask+docker+compose+site:github.com Sandman ------------------------ | Src: https://github.com/jeffknupp/sandman | Docs: https://pythonhosted.org/sandman/using_sandman.html | Docs: https://pythonhosted.org/sandman/authentication.html - (Flask, SQLAlchemy) (obsolete) Sandman2 ------------------------ | Src: https://github.com/jeffknupp/sandman2 | Docs: https://sandman2.readthedocs.io/en/latest/ | Docs: https://sandman2.readthedocs.io/en/latest/interacting.html#searching-filtering-and-sorting - (Flask, SQLAlchemy) - https://github.com/jeffknupp/sandman2/blob/master/tests/test_sandman2.py - pytest_flask pytest_flask ------------------------ | Src: https://github.com/pytest-dev/pytest-flask | Docs: https://pytest-flask.readthedocs.io/en/latest/ | Docs: https://pytest-flask.readthedocs.io/en/latest/features.html | Pypi: https://pypi.python.org/pypi/pytest-flask - pytest links: https://westurner.org/wiki/awesome-python-testing#py-test Flask-Admin ------------------------ | Src: https://github.com/flask-admin/flask-admin | Docs: https://flask-admin.readthedocs.io/en/latest/ - Bootstrap, select2 - http://examples.flask-admin.org/ - https://github.com/flask-admin/flask-admin/tree/master/examples Flask Blueprints ------------------------ | Docs: http://flask.pocoo.org/docs/latest/blueprints/ - "Modular Applications with Blueprints" ... Two birds with one stone: these links'll also be helpful for this research: https://github.com/westurner/wiki/wiki/webframeworks On Mon, Feb 27, 2017 at 4:54 PM, Steve Young via Omaha wrote: > Bob Hafner is speaking in Micro-services and Flask. > > We should have some time for the Kaggle group to give an update - last I > saw we were just out of the top 1000. > > Wednesday, March 15 > Do-Space meeting room 1. > 6:30-8pm (the room was booked until 6:30, so we won't be able to access it > until close to that time) > > http://www.omahapython.org/blog/archives/event/march- > meeting-micro-services-with-flask?instance_id=37 > > Steve > _______________________________________________ > Omaha Python Users Group mailing list > Omaha at python.org > https://mail.python.org/mailman/listinfo/omaha > http://www.OmahaPython.org >