Cleaning up conditionals

Deborah Swanson python at deborahswanson.net
Mon Jan 2 13:40:19 EST 2017


Dennis Lee Bieber wrote, on January 02, 2017 8:30 AM:
> 
> On Sun, 1 Jan 2017 18:30:03 -0800, "Deborah Swanson" 
> <python at deborahswanson.net> declaimed the following:
> 
> >Dennis Lee Bieber wrote, on Sunday, January 01, 2017 6:07 PM
> >> 
> >> 
> >> l1	2 br, Elk Plains	12-26		WA/pi
> >> l2	2 br, Elk Plains	12-29				
> >> 	house garage, w/d
> >> 
> 
> >Yes, that's exactly what I'm doing in this bit of code, 
> making all the 
> >listings that are identical except for the date be identical 
> except for 
> >the date. You may or may not have tried this approach to finding the
> 
> 	Out of curiosity -- don't those listings have street 
> addresses? There must be lots of "2 br" units in the city.
> 
> >ideal house, but this is my third house hunt, using essentially the 
> >same approach, but in a different language each time. 
> (Python is by far 
> >the best version. A dictionary with city names as keys, and several 
> >data items as values has literally cut the task in half.)
> >
> 
> 	Part of the hassle in your schema is that you are 
> treating the type of unit (2 br) AND the city (as shown in 
> your sample; I'd hope a full address is available) as a 
> single data field. Similarly, you have some obscure code 
> attached to the state.
> 
> 
> 	Maybe I've spent too much time with having to optimize 
> data structures over the last 30 years, but for my view I'd 
> end up with a relational database (most Python distributions 
> include SQLite3, so you don't have to manage a full 
> client-server system [reminds me, I still need to recreate 
> MySQL on my system, after a drive failure and OS upgrade]). 
> Something like:
> 
> create table Properties
> (
> 	ID integer auto-increment primary key,
> 	State char,
> 	City char not null,
> 	Address char not null,
> 	Type char,
> 	Code char,
> 	unique (State, City, Address)
> )
> 
> create table Listings
> (	ID integer auto-increment primary key,
> 	Property integer references Properties(ID),
> 	Date datetime,
> 	Notes varchar,
> 	unique (Property, Date)
> )
> 
> 	Since I don't know what your "code" (the "pi" above) 
> represents, I don't know if it should be in Properties or 
> Listings, nor if it is a constraint for uniqueness. Type is 
> "2 br". I put both in the Properties table as you had them 
> attached to the city and state sample data. I'd have 
> preferred to put a "not null" on State, but since that is one 
> of the fields your example allows to be blank (I presume on 
> the basis that a local town paper may assume the listing is 
> for said state)...
> 
> 	The "unique" constraints mean that any combination of 
> "State, City, Address" only appears once, same for "Property, 
> Date" combination.
> 
> 	This would be the master data, which you'd only perform 
> updates upon.
> 
> 	For an update you would formulate a SELECT using the 
> fields you have for State, City, Address (so, if no State, 
> you'd select on just City,
> Address) if multiple records are returned (only possible if 
> you are missing one of the three fields) you'd have to log 
> them as ambiguous with the new listing and needing to be hand 
> adjusted -- eg; you need to add the missing field by hand); 
> If a single record is returned you can perform an UPDATE for 
> Type/Code. If no record is returned you insert a new record 
> with the fields you have.
> 	THEN, using the unique ID of the Properties record, you 
> INSERT a new listing record with the date and notes -- if the 
> insert is rejected (duplicate Property/Date) it indicates you 
> may have already processed that listing before. You can't 
> check for that until you've determined the ID of the Property itself.
> 
> 	Determining if a property has multiple listings becomes a SELECT
> 
> select p.State, p.City, p.Address, p.Type, p.Code, 
> count(l.ID) from Properties as p inner join Listings as l on 
> p.ID = l.Property group by p.ID order by p.State, p.City, p.Address
> 
> 
> 	Yes, you might have to run periodic checks for missing 
> data fields when the field is allowed to be NULL and is part 
> of the uniqueness constraint. This would occur, for the 
> provided schema, if you had inserted a property with a null 
> State, and then later had a listing with the state provided 
> -- as the update/insert logic would come back that there is 
> no entry for (State, City, Address) -- it would not see the 
> (noState, City, Address) entry. If the State field were set 
> to "not null" you'd get an error at that time that could be 
> logged, allowing you to clean up the new data and reprocess 
> it -- instead of running a series of SELECT statements with 
> each one leaving out one of the "null allowed" fields
> 
> select p.ID, p.State, p.City, p.Address
> from Properties as p
> order by p.City, p.Address
> 
> to get all records with the same city/address (with some 
> work, and using subselects, the report could be reduced by 
> first selecting only the records where State is NULL, then 
> using just those city/addresses for the report selection. 
> It's been a while, but something like (consider all of this
> pseudocode)
> 
> select p.ID, p.State, p.City, p.Address 
> from Properties as p 
> inner join (select distinct n.City, n.Address 
> 			from Properties as n
> 			where n.State is null) as n2
> where p.City = n2.City and p.Address = n2.Address
> order by p.City, p.Address, p.State
> 
> would return all city/address combination where at least one 
> record has an empty state.
> 	
> -- 
> 	Wulfraed                 Dennis Lee Bieber         AF6VN
>     wlfraed at ix.netcom.com    HTTP://wlfraed.home.netcom.com/

I was actually planning to shift from Excel for permanent data storage
to SQLite, but as a future development and not something I'm going to do
right now. It took me 14 months to find the house I'm living in now, and
in the current real estate market it may take that long or longer to
find another one as nice this time. But if I get it all down to 15 min
of work each night that will be fine.  Haha, as long as I can afford my
new and higher rent!

Also, I just started this project and it's been kind of wild and woolly
in the early stages, which is why I need to be concerned about so many
oddball cases that won't occur when I have the operation running
smoothly. I'll be continuing to rewrite and fine tune my code for some
time, and I'd planned to do the migration to SQLite, and learn how to
use SQLite and how to use it with python after the core code is stable.
I know a smattering of SQL, but I'm not proficient with it, so I'm just
taking it all one step at a time.

But I'm keeping this message from you in my project file, so when the
time comes I'm sure I'll refer to all the helpful information you've
given in it. And I'll also be concerned about the data integrity issues
you mention, so that I can get it down to a clean, stable operation
that's reliable and takes very little time to maintain. All down the
road though, not quite yet.

As to your specific data curiosities, the '/co' part of 'st/co' is the
county, so I have state/county pairs in that field. In this case, 'pi'
is the 2 letter code for Pierce County. I may break state and county out
into 2 separate fields at some point, but it would only be advisable if
I'm actually using the county for anything other than just seeing where
it is when I look at the complete location data. State and county do go
together, and it made building the locations dictionary by hand easier
to bundle them together, but at some point I'll have all the location
data and won't be working on it anymore, and that might be the time to
separate them.  

I'm using Craiglist, and oddly enough they don't make any provision for
the address. So if one is available at all it's tucked away in 3 or 4
different places. I will be adding address and phone number fields after
I get the bulk of it under control, but I will only be doing the
detective work on places I might actually go to look at. Also, I'll
probably add local realtor's listings to my Craigslist base, but that
too is a future development. And I've kept the photos in past projects,
and would like to do that this time too, but that will probably be one
of the last things I do. About the time that I start getting serious
about going out and looking at these places.

Thanks for all your suggestions, especially about using SQLite, which I
will be doing relatively soon. I'm pretty sure there's a way to store
the photos in SQLite.



More information about the Python-list mailing list