[Tutor] 0 > "0" --> is there a "from __future__ import to make this raise a TypeError?

Martin A. Brown martin at linux-ip.net
Tue Oct 13 13:18:08 EDT 2015


Greetings Albert-Jan,

I have a suggestion and a comment in this matter.

> Yes, a justified fear of making mistakes. That is, a mistake has 
> already occurred and I don't want it to happen again.

Suggestion:  Choose a single in-memory representation of your data 
and make all of your input and output functions perform conversion 
to appropriate data types.

See below for a bit more explanation.

> I made a comparison with data from two sources: a csv file 
> (reference file) an sqlite database (test data). The csv module 
> will always return str, unless one converts it. The test data were 
> written to sqlite with pandas.to_sql, which (it seems) tries to be 
> helpful by making INTs of everything that looks like ints. I chose 
> sqlite because the real data will be in SQL server, and I hope 
> this would mimic the behavior wrt None, NULL, nan, "", etc.

Comment and explanation:  I have been following this thread and I 
will tell you how I would look at this problem (instead of trying to 
compare different data types).

   * It sounds as though you will have several different types of
     backing stores for your data.  You mentioned 1) csv, 2) sqlite,
     3) SQL server.  Each of these is a different serialization tool.

   * You also mention comparisons.  It seems as though you are
     comparing data acquired (read into memory) from backing store 1)
     to data retrieved from 2).

If you are reading data into memory, then you are probably planning 
to compute, process, transmit or display the data.  In each case, 
I'd imagine you are operating on the data (numerically, if they are 
numbers).

I would write a function (or class or module) that can read the data 
from any of the backing stores you want to use (csv, sqlite, SQL 
server, punch cards or even pigeon feathers).  Each piece of code 
that reads data from a particular serialization (e.g. sqlite) would 
be responsible for converting to the in-memory form.

Thus, it would not matter where you store the data...once it's in 
memory, the form or representation you have chosen will be 
identical.

There is the benefit, then, of your code being agnostic (or 
extensible) to the serialization tool.

By the way, did you know that pandas.to_csv() [0] also exists?

-Martin

  [0] http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

-- 
Martin A. Brown
http://linux-ip.net/


More information about the Tutor mailing list