[Tutor] 0 > "0" --> is there a "from __future__ import to make this raise a TypeError?
Martin A. Brown
martin at linux-ip.net
Tue Oct 13 13:18:08 EDT 2015
Greetings Albert-Jan,
I have a suggestion and a comment in this matter.
> Yes, a justified fear of making mistakes. That is, a mistake has
> already occurred and I don't want it to happen again.
Suggestion: Choose a single in-memory representation of your data
and make all of your input and output functions perform conversion
to appropriate data types.
See below for a bit more explanation.
> I made a comparison with data from two sources: a csv file
> (reference file) an sqlite database (test data). The csv module
> will always return str, unless one converts it. The test data were
> written to sqlite with pandas.to_sql, which (it seems) tries to be
> helpful by making INTs of everything that looks like ints. I chose
> sqlite because the real data will be in SQL server, and I hope
> this would mimic the behavior wrt None, NULL, nan, "", etc.
Comment and explanation: I have been following this thread and I
will tell you how I would look at this problem (instead of trying to
compare different data types).
* It sounds as though you will have several different types of
backing stores for your data. You mentioned 1) csv, 2) sqlite,
3) SQL server. Each of these is a different serialization tool.
* You also mention comparisons. It seems as though you are
comparing data acquired (read into memory) from backing store 1)
to data retrieved from 2).
If you are reading data into memory, then you are probably planning
to compute, process, transmit or display the data. In each case,
I'd imagine you are operating on the data (numerically, if they are
numbers).
I would write a function (or class or module) that can read the data
from any of the backing stores you want to use (csv, sqlite, SQL
server, punch cards or even pigeon feathers). Each piece of code
that reads data from a particular serialization (e.g. sqlite) would
be responsible for converting to the in-memory form.
Thus, it would not matter where you store the data...once it's in
memory, the form or representation you have chosen will be
identical.
There is the benefit, then, of your code being agnostic (or
extensible) to the serialization tool.
By the way, did you know that pandas.to_csv() [0] also exists?
-Martin
[0] http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
--
Martin A. Brown
http://linux-ip.net/
More information about the Tutor
mailing list