From editor at the-tech-news.com  Sat May 10 16:14:17 2003
From: editor at the-tech-news.com (The TechNews)
Date: Sat, 10 May 2003 16:14:17 +0200
Subject: [Csv] Worldwide Partners program
Message-ID: <MDAEMON-F200305101616.AA165716md50036788351@wtnserver>

The TechNews, May 2003

Production Mini-plants in mobile containers. Worldwide Partners program

Science Network will supply to countries and developing regions the technology and necessary support for production in series of Mini-plants in mobile containers (40-foot). The Mini-plant system is designed in such a way that all the production machinery is fixed on the platform of the container, with all wiring, piping, and installation parts; that is, they are fully equipped... and the mini-plant is ready for production."

More than 700 portable production systems: Bakeries, Steel Nails, Welding Electrodes, Tire Retreading, Reinforcement Bar Bending for Construction Framework, Sheeting for Roofing, Ceilings and Fa?ades, Plated Drums, Aluminum Buckets, Injected Polypropylene Housewares, Pressed Melamine Items (Glasses, Cups, Plates, Mugs, etc.), Mufflers, Construction Electrically Welded Mesh, Plastic Bags and Packaging, Mobile units of medical assistance, Sanitary Material, Hypodermic Syringes, Hemostatic Clamps, etc. 
Science Network has started a Co-investment program for the installation of small Assembly plants to manufacture in series the Mini-plants of portable production on site, region or country where required. One of the most relevant features is the fact that these plants will be connected to the World Trade System (WTS) with access to more than 50 million raw materials, products and services and automatic transactions for world trade.

Due to financial reasons, involving cost and social impact, the best solution is setting up assembly plants on the same countries and regions, using local resources (labor, some equipment, etc.) Science Network participates at 50% (fifty percent) for investment of each Assembly plant.

If you are interested in being a Science Network partner in your country or region, you can send your CV to:

Mini-plants <A HREF="mailto:letters at the-tech-news.com">Worldwide Partners program</A>

By Robert B. Lethe, The TechNews, Editor

-------------------------------------------------------------------------
If you received this in error or would like to be removed from our list, please return us indicating: remove or un-subscribe in subject field, Thanks. <A HREF="mailto:editor at the-tech-news.com"> Editor</A>
? 2003 The TechNews. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/csv/attachments/20030510/546c6e23/attachment.htm 

From editor at the-tech-news.com  Mon May 12 11:32:47 2003
From: editor at the-tech-news.com (The TechNews)
Date: Mon, 12 May 2003 11:32:47 +0200
Subject: [Csv] Worldwide Partners program
Message-ID: <MDAEMON-F200305121136.AA363148md50036883209@wtnserver>

The TechNews, May 2003

Production Mini-plants in mobile containers. Worldwide Partners program

Science Network will supply to countries and developing regions the technology and necessary support for production in series of Mini-plants in mobile containers (40-foot). The Mini-plant system is designed in such a way that all the production machinery is fixed on the platform of the container, with all wiring, piping, and installation parts; that is, they are fully equipped... and the mini-plant is ready for production."

More than 700 portable production systems: Bakeries, Steel Nails, Welding Electrodes, Tire Retreading, Reinforcement Bar Bending for Construction Framework, Sheeting for Roofing, Ceilings and Fa?ades, Plated Drums, Aluminum Buckets, Injected Polypropylene Housewares, Pressed Melamine Items (Glasses, Cups, Plates, Mugs, etc.), Mufflers, Construction Electrically Welded Mesh, Plastic Bags and Packaging, Mobile units of medical assistance, Sanitary Material, Hypodermic Syringes, Hemostatic Clamps, etc. 
Science Network has started a Co-investment program for the installation of small Assembly plants to manufacture in series the Mini-plants of portable production on site, region or country where required. One of the most relevant features is the fact that these plants will be connected to the World Trade System (WTS) with access to more than 50 million raw materials, products and services and automatic transactions for world trade.

Due to financial reasons, involving cost and social impact, the best solution is setting up assembly plants on the same countries and regions, using local resources (labor, some equipment, etc.) Science Network participates at 50% (fifty percent) for investment of each Assembly plant.

If you are interested in being a Science Network partner in your country or region, you can send your CV to:

Mini-plants <A HREF="mailto:letters at the-tech-news.com">Worldwide Partners program</A>

By Robert B. Lethe, The TechNews, Editor

-------------------------------------------------------------------------
If you received this in error or would like to be removed from our list, please return us indicating: remove or un-subscribe in subject field, Thanks. <A HREF="mailto:editor at the-tech-news.com"> Editor</A>
? 2003 The TechNews. All rights reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/csv/attachments/20030512/58817d30/attachment.html 

From gnkomo1 at yahoo.com  Tue May 13 04:19:35 2003
From: gnkomo1 at yahoo.com (Mr. Godstime Nkomo)
Date: Tue, 13 May 2003 04:19:35 +0200
Subject: [Csv] THE PRESIDENT { URGENT }
Message-ID: <200305130219.h4D2JXD12991@manatee.mojam.com>


  Joshua Nkomo Avenue, Bulawayo, Zimbabwe.                                                                       
Director:
Godstime Nkomo
Holland Contact: + 31 627  195 903                                                                                                     
Private Email:gnkomo1 at zwallet.com								                                                
Urgent!
Dear Sir,

Do not be surprised about this letter to you as I got your contact through my mother. My name is Godstime Nkomo the last born of late Joshua Nkomo the former Vice President of Zimbabwe and Former President of Zimbabwe African People's Union (ZAPU). However, I was studying in a boarding high school when I had to forcefully return for my father's burial. After the burial, my father's lawyer notified the family about his Will in their chamber. While going through the will, I discovered that my father used his position as the Former Vice President to acquire and make a deposition of US$ 23 M {Twenty-three Million United States dollars} in a special Security Company in Amsterdam Holland with my name.
I immediately had to travel to Europe to ascertain the authenticity of the Deposit, which I have verified, and to seek asylum as a result of the interest of President Mugabe over my father assets. In Europe a financial Expert advised me on the best way to safe guard and realize usefully this funds without any problems from the Dutch authorities. This was to involve a foreigner, who will come down here to Holland, open a non-resident Bank account where the money will be deposited for onward transfer to any nominated account overseas as my refugee status limit's my opportunities here.
This is why I am making this contact with you now. You are to come down to Holland and assist me in getting the money out to your beautiful country where I and my family can make further investment of the money and where I can live a better life and have a better education as this is my life long dream. Here in Holland, The labor law act of Holland and my politically exiled life from President Mugabe and Zimbabwean Authorities has eliminated any chances of my owning an account freely.
Please, do contact me with my Tel No above  indicating your interest and capability for more details. You are entitled to 20% of the total amount for assisting me; I have mapped out 3% for immediate re-imbursement for expenses upon your arrival here, while the rest will be for me, and my family members, which I would like to invest in your country under your close supervision and direction of which you will be entitled to 10% of the after tax returns on investment of my share. Note: The content of the consignment is US$ 23 million cash money, but the Security Company does not know the actual content of the consignment because it was deposited by my father with a declaration that the content is precious metals and diamond valued at US$ 23 M, and this was done with diplomatic immunity and for security reason, at the time, my father was still in government in my country Zimbabwe.
This transaction is basically risk-free for you; therefore, reach me, preferably with the above stated phone number to ensure the security of this transaction. Note that it is because of the confidential nature of this transaction that I am giving you this contact info. Which you can use to reach me as soon as possible
Awaiting your immediate, urgent contact!

Yours Faithfully,
GODSTIME NKOMO
Director.





From WorkFromHome7689 at excite.com  Wed May 14 23:23:01 2003
From: WorkFromHome7689 at excite.com (Acardong)
Date: Wed, 14 May 2003 21:23:01 GMT
Subject: [Csv] ***WORK FROM HOME...MAKE BIG $$$
Message-ID: <200305142123.h4ELNpD00311@manatee.mojam.com>


  GET STARTED WORKING FROM HOME TODAY!

This message contains valuable information about our
organization and qualified specialists who have
extensive knowledge and experience in WORKING
FROM HOME.

We have spent the last decade researching home employment 
options available to the public. After spending thousands of hours 
in research, we can confidently promise you that NO ONE has 
better information on this subject.  

---WORK IN THE COMFORT OF YOUR OWN HOME--- 
               
    ***WIDE SELECTION OF JOBS...TOP PAY***

       --REAL JOBS WITH REAL COMPANIES--

Plus receive your very own "Computer Cash Disk" FREE!

Every day thousands of people just like you are getting
started working at home in fields of computer work, sewing, 
assembling products, crafts, typing, transcribing, mystery shopping,
getting paid for their opinion, telephone work and much more!

          WHO ARE HOME WORKERS?

They are regular, ordinary people who earn an excellent
living working at their own pace and make their own hours.
They are fortunate people who have found an easier way
to make a living. They had absolutely no prior experience
in this field. They earn a good weekly income in the comfort 
of their own home and you can be next!

Companies all over the United States want to hire you as an
independent home-worker. You are a valuable person to
these companies because you will actually be saving them
a great amount of money.

These companies want to expand their business, but do not
want to hire more office people. If they hired more office
employees, they would have to supervise them, rent more 
office space, pay more taxes and insurance, all involving
more paperwork. It is much easier for them to set it up so
you can earn an excellent income working in the comfort of
your own home.

    -------------------LIVE ANYWHERE--------------------

You can live anywhere and work for most of these companies.
The companies themselves can be located anywhere.
For computer work, the companies provide you with
assignments, usually data entry or similar tasks. You
then complete the project and get paid for each task.
You receive step by step instructions to make it easier 
for you and to insure you successfully complete the job.

After you're finished, you ship the completed assignments
back to the company at no charge to yourself. Upon receiving
your assignments, the company will then mail you a check
along with more assignments. It's that easy!

All the other home-based work (sewing, merchandising, surveys,
product assembly, typing, telephone work, transcribing, and mystery shopping)
are  done in a similar way. After contacting the companies you will
be given step by step instructions and information on what you
need to do. Upon completion of the task, they mail you a check.

You have the potential to work for nearly every company in our 
guide. The only jobs that require equipment is computer work 
(computer needed), typing (typewriter or computer). All other 
work requires no equipment of your own.

          $$$EARN EXTRA INCOME AT HOME$$$

All business can be done by mail , phone, or online. You can START
THE SAME DAY you receive "The Guide to Genuine Home
Employment."

     *****ONLY REAL COMPANIES OFFERING REAL JOBS!*****

The companies in our guide are legitimate and really need
home workers. There is over two hundred of the top companies
included in our guide offering an opportunity for you to make
extra income at home. Unlike other insulting booklets or lists you
may see, our guide only includes up to date information of
companies who pay top dollar for your services and will hire you.
WITHOUT CHARGING YOU FEES TO WORK FOR THEM, 
GUARANTEED!

**UPDATE.....Now our guide explains and goes into detail
about each company and what they have to offer you!
You are guaranteed to find home based work in our guide.
No problem!

**UPDATE.....Our new edition offers an entirely new category 
of work. It reveals a new, unique way to get paid for your opinion
online. Just surf to the proper website and get paid to fill out opinion
surveys! What could be easier!

We urge you to consider this extraordinary opportunity. Don't delay
or you could miss out! This is like no other offer you've ever seen.

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

This is an opportunity to become an independent
HOME WORKER. Remember, this is NOT a get-rich-
quick-scheme. It is an easy way for you to earn money
while filling the needs of a company who needs
you. This makes it easy to work at your OWN
PACE and in the comfort of YOUR HOME.

***HERE'S HOW TO GET STARTED IMMEDIATELY***

Print the form below, fill in your information and mail it to us, 
along with the small one time fee for the guide. We will ship 
the "Guide To Genuine Home Employment" out the same 
day we receive your reply form!

Order within 15 days and the complete, updated, sure-fire,
Genuine Home Employment Guide is yours for the special low
price of only $39.95! That's over 37% off our normal price of
$69.95.

**Don't delay one more minute, START NOW!!!**

**FREE BONUS....."COMPUTER CASH DISK" (MAC & IBM 
compatible) 167 business reports. Tips, tricks and secrets on starting 
and operating a successful home based business and how to avoid
dishonest marketing offers. Comes with full reproduction rights!
READ THEM, SELL THEM AND BANK THE MONEY. Never pay
us any royalties. Sells for $69.00, but it's worth a whole lot more than
that. Get yours today...FREE!

***EXTRA FREE BONUS...105 Home Businesses you can start 
immediately! This manual will show you over 100 home based 
businesses you can start right away. The information in this manual
will show you from start to finish how to run a homebased business 
of your choice.

>>>FULL 60 DAY RISK FREE MONEY-BACK GUARANTEE!

Test our material out for a 60 day free trial period and if it isn't 
everything we said it is, just send it back and we will gladly refund 
your money. We've helped thousands of people like yourself get 
started working at home over the last eight years. You can be the next!

THINK WHAT AN EXTRA INCOME COULD DO FOR YOU!

LET US HEAR FROM YOU TODAY!

THIS COULD EASILY CHANGE YOUR LIFE FOREVER!

DON'T LET THIS EXTRAORDINARY OPPORTUNITY
PASS!! 

THESE OPPORTUNITIES ARE PROFITABLE
AND EASY.  

ACT NOW!!!

HERE'S HOW TO GET STARTED......

Send Check or Money Order for only $39.95 and the 
completed order form below to us at:

Cybernet HWA
PO Box 914
North Branford, CT 06471

(Your order will be shipped the same day it is received)
-----------------------------------------------------------------------------

EZ ORDER FORM

_____  Yes! I am interested in a REAL home job. I
am ordering within 15 days. Here is my $39.95. Please
rush me my package today including "The Guide to
Genuine Home Employment",  your "Free Computer
Cash Disk" & your manual "105 Home Businesses you 
can start immediately". 

(Please PRINT all information CLEARLY)

NAME_________________________________________

ADDRESS _____________________________________

CITY _________________________________________

STATE ____________________  ZIP _______________

EMAIL ______________________ at ________________

PHONE  (             )   _____________________________


From skip at pobox.com  Thu May 15 22:15:19 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 15 May 2003 15:15:19 -0500
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module
In-Reply-To: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
Message-ID: <16067.62807.842732.949436@montanaro.dyndns.org>


I'm replying on c.l.py, but note that for future reference this thread
belongs on csv at mail.mojam.com (on the cc: list).

    Bernard> The CSV module only allows a single character as delimiter,
    Bernard> which does not (easily) allow one to write generic code that
    Bernard> would not be at the mercy of whatever the current locale is of
    Bernard> the user who sends you a csv file. Fortunately the Sniffer
    Bernard> class is provided for guessing the most likely delimiter, and
    Bernard> seems to work fine, from my limited tests.

I'll leave Dave and Andrew to comment on the possibility of admitting a
multiple-character delimiter string, as that will affect their C code.

    Bernard> There's an error in the documentation of Sniffer().sniff(x),
    Bernard> though: its x argument is documented as a file object, whereas
    Bernard> the code actually expects a sample buffer. 

Thanks, I'll fix the docs.  They didn't quite catch up to the last-minute
changes I made to the code.

    Bernard> I feel though, that this unfortunately forces one to write more
    Bernard> code than is really needed, typically in the following form:

    Bernard>     sample = file( 'data.csv' ).read( 8192 )
    Bernard>     dialect = csv.Sniffer().sniff( sample )
    Bernard>     infile = file( 'data.csv' )
    Bernard>     for fields in csv.reader( infile, dialect ):
    Bernard>         # do something with fields

    Bernard> That's a tad ugly, having to open the same file twice in
    Bernard> particular.

I recognize the issue you raise.  As originally written, the Sniffer class
also took a file-like object, however, it relied on being able to rewind the
stream.  This would, for example, prevent you from feeding sys.stdin to the
sniffer.  I also felt the decision of rewinding the stream belonged with the
caller.  I decided to change it to accepting a small data sample instead.
You can avoid multiple opens by rewinding the stream yourself (in the common
case where the stream can be rewound):

    infile = file('data.csv')
    sample = infile.read(8192)
    infile.seek(0)
    dialect = csv.Sniffer().sniff( sample )
    for fields in csv.reader( infile, dialect ):
        # do something with fields

Note that after the sniffer does its thing you should check that it returned
reasonable values.

    Bernard> (2)
    Bernard>     for fields in csv.reader( infile, dialect='sniff' ):
    Bernard>         # do something with fields

Do you mean to imply that the csv.reader object should call the sniffer
implicitly and use the values it returns?  That's an interesting idea but
the sniffer isn't guaranteed to always guess right.

Skip

From bdelmee at advalvas.be  Thu May 15 19:34:30 2003
From: bdelmee at advalvas.be (Bernard Delmée)
Date: Thu, 15 May 2003 19:34:30 +0200
Subject: [Csv] [PEP305] Python 2.3: a small change request in CSV module
Message-ID: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>

(Passing along from comp.lang.python... -skip)

I may be a bit late to the ball with the beta already out, but I'd like
to request a little change/addition to the otherwise very neat new CSV
module. The field separator m$excel uses depends on the user locale
(windows control panel, regional settings, list separator). I for one
very often see either a comma (the default for the csv module) or a
semi-colon being used.

The CSV module only allows a single character as delimiter, which does
not (easily) allow one to write generic code that would not be at the
mercy of whatever the current locale is of the user who sends you a csv
file. Fortunately the Sniffer class is provided for guessing the most
likely delimiter, and seems to work fine, from my limited tests.

There's an error in the documentation of Sniffer().sniff(x), though:
its x argument is documented as a file object, whereas the code actually
expects a sample buffer. Once you feed it appropriately, this works fine
and deals nicely with the above mentioned problem of choosing the right
delimiter. I feel though, that this unfortunately forces one to write
more code than is really needed, typically in the following form:

    sample = file( 'data.csv' ).read( 8192 )
    dialect = csv.Sniffer().sniff( sample )
    infile = file( 'data.csv' )
    for fields in csv.reader( infile, dialect ):
        # do something with fields

That's a tad ugly, having to open the same file twice in particular.
What I would like to see instead is either:
(1)
    for fields in csv.reader( infile, dialect='excel', delimiter=',|;' ):
        # do something with fields

*or* probably more realistically:
(2)
    for fields in csv.reader( infile, dialect='sniff' ):
        # do something with fields

I guess allowing multi-character or regular expressions as delimiters
would be too much of a change, especially since the real data splitting
seems to occur in a C module. But solution (2) is very easy to implement
in plain python, and just needs to use a Sniffer to guess the correct
Dialect instead of forcing the user to "hard choose" one.

Sorry for the longish explanation for a fairly simple change request,
really. If this is not the appropriate place for posting, please let
me know. Thanks for reading this far; if you've looked at python 2.3
you'll agree that it looks like another very promising piece of
Dutch technology ;-)

Cheers,

Bernard.



-- 
http://mail.python.org/mailman/listinfo/python-list

From skip at pobox.com  Fri May 16 21:08:16 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri, 16 May 2003 14:08:16 -0500
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module
In-Reply-To: <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net>
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
        <mailman.1053029820.28831.python-list@python.org>
        <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net>
Message-ID: <16069.14112.477663.5928@montanaro.dyndns.org>


    >> I'll leave Dave and Andrew to comment on the possibility of admitting
    >> a multiple-character delimiter string, as that will affect their C
    >> code.

    Bernard> Are they monitoring this ng as well, or should I repost
    Bernard> elsewhere?  Notice I am not asking for a multichar delimiter
    Bernard> but for multiple alternate single-char separators.

As I mentioned in my original note, the best place for this discussion is
csv at mail.mojam.com.  I'm sure Dave and Andrew are there.  I don't know how
regularly they monitor c.l.py.

    Bernard> for fields in csv.reader( infile, dialect='sniff' ):
    Bernard>     # do something with fields

    >> Do you mean to imply that the csv.reader object should call the
    >> sniffer implicitly and use the values it returns?  That's an
    >> interesting idea but the sniffer isn't guaranteed to always guess
    >> right.

    Bernard> Yes that's exactly my suggestion. 

I'm not sure we have that much confidence in the sniffer at this point.  

    Bernard> Also, if this was supported directly in reader(), the file-like
    Bernard> argument would not necessarily have to be seekable, it could
    Bernard> conceivably just use the first read data chunk for the
    Bernard> guess-work as well as for further parsing of the first rows.

Not necessarily.  It depends on how the file is accessed.  I believe it's
treated as an iterator, it which case you wind up having to read several
records, pass them off to the sniffer, set your dialect, reprocess the lines
you've already read, then process the remaining unread lines in the file.
This would be more tedious from C than from Python.

    Bernard> I hope this could be deemed a common enough usage to grant
    Bernard> inclusion in the standard module.

I have my own special interests (mostly reading and writing multi-megabyte
CSV files), but I don't think I've ever not known what the delimiter was.
Still, that may just be because I live in the bully country of which Texas
is a part. :-(

Skip


From LogiplexSoftware at earthlink.net  Fri May 16 22:13:12 2003
From: LogiplexSoftware at earthlink.net (Cliff Wells)
Date: 16 May 2003 13:13:12 -0700
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
	module
In-Reply-To: <16067.62807.842732.949436@montanaro.dyndns.org>
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
	 <16067.62807.842732.949436@montanaro.dyndns.org>
Message-ID: <1053115991.1448.96.camel@software1.logiplex.internal>

On Thu, 2003-05-15 at 13:15, Skip Montanaro wrote:

>     Bernard> The CSV module only allows a single character as delimiter,
>     Bernard> which does not (easily) allow one to write generic code that
>     Bernard> would not be at the mercy of whatever the current locale is of
>     Bernard> the user who sends you a csv file. Fortunately the Sniffer
>     Bernard> class is provided for guessing the most likely delimiter, and
>     Bernard> seems to work fine, from my limited tests.

As Skip mentioned, the sniffer isn't guaranteed to determine the
dialect.  Given reasonably sane CSV files, my confidence is good that it
will do the right thing.  Feed it something bizarre and you might get
bit.  There are even a couple of reasonable cases that might toss it. 
Feed "01/01/2003?10:10:56?10:15:02?hello, dolly" to it and see what you
get <wink>.  As you can see, it isn't certain what the delimiter might
be, even though the data is well-formed.

That bit of doubt, no matter how small, is enough to warrant human
intervention/confirmation prior to parsing and importing a couple of MB
of garbage into your SQL server.  You might feel confident in *your*
data, but we don't want to encourage other people to blindly trust the
sniffer.

Come to think of it, perhaps the sniffer should be raising an exception
rather than returning None when it fails...


>     Bernard> I feel though, that this unfortunately forces one to write more
>     Bernard> code than is really needed, typically in the following form:
> 
>     Bernard>     sample = file( 'data.csv' ).read( 8192 )
>     Bernard>     dialect = csv.Sniffer().sniff( sample )
>     Bernard>     infile = file( 'data.csv' )
>     Bernard>     for fields in csv.reader( infile, dialect ):
>     Bernard>         # do something with fields
> 
>     Bernard> That's a tad ugly, having to open the same file twice in
>     Bernard> particular.
> 
> I recognize the issue you raise.  As originally written, the Sniffer class
> also took a file-like object, however, it relied on being able to rewind the
> stream.  This would, for example, prevent you from feeding sys.stdin to the
> sniffer.  I also felt the decision of rewinding the stream belonged with the
> caller.  I decided to change it to accepting a small data sample instead.
> You can avoid multiple opens by rewinding the stream yourself (in the common
> case where the stream can be rewound):
> 
    infile = file('data.csv')
>     sample = infile.read(8192)
>     infile.seek(0)
>     dialect = csv.Sniffer().sniff( sample )
>     for fields in csv.reader( infile, dialect ):
>         # do something with fields

    infile = file('data.csv')
    sample = infile.read(8192)
    infile.seek(0)
    dialect = csv.Sniffer().sniff( sample )
    for fields in csv.reader( infile, dialect ):
        # do something with fields


Or even:

    infile = file('data.csv')
    dialect = csv.Sniffer().sniff( infile.read(8192) )
    if dialect:
        infile.seek(0)
        for fields in csv.reader( infile, dialect ):
            # do something with fields

Doesn't seem too bad.  There really doesn't seem to be a universal
solution to this.  If you use the sniffer you're forced to rewind.

>     Bernard> (2)
>     Bernard>     for fields in csv.reader( infile, dialect='sniff' ):
>     Bernard>         # do something with fields
> 
> Do you mean to imply that the csv.reader object should call the sniffer
> implicitly and use the values it returns?  That's an interesting idea but
> the sniffer isn't guaranteed to always guess right.

Yes.  It looks elegant but it's far too dangerous.  Especially just to
save a couple of lines of code.

You might also take a look at http://python-dsv.sf.net.  The code from
the sniffer was derived to a great extent from that code.  I'm planning
(some dreamy day) to rewrite DSV to take advantage of the Python CSV
module.  The point is that this is the sort of thing the sniffer was
meant to help with:  giving the user a preview of the data that they can
*confirm* is correct before actual importing and destruction of your
existing data begins <wink>.

Regards,

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308


From bdelmee at advalvas.be  Fri May 16 23:09:59 2003
From: bdelmee at advalvas.be (=?Windows-1252?Q?Bernard_Delm=E9e?=)
Date: Fri, 16 May 2003 23:09:59 +0200
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in
	CSVmodule
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
	<16067.62807.842732.949436@montanaro.dyndns.org>
	<1053115991.1448.96.camel@software1.logiplex.internal>
Message-ID: <000801c31bef$86be0680$6702a8c0@duracuire>

> As Skip mentioned, the sniffer isn't guaranteed to determine the
> dialect.  Given reasonably sane CSV files, my confidence is good that it
> will do the right thing.  Feed it something bizarre and you might get
> bit.  There are even a couple of reasonable cases that might toss it. 
> Feed "01/01/2003?10:10:56?10:15:02?hello, dolly" to it and see what you
> get <wink>.  As you can see, it isn't certain what the delimiter might
> be, even though the data is well-formed.

Sure. Maybe a second, optional arg to Sniffer().sniff(sample,seplist)
could restrict the set of allowed/expected delimiters? As I originally
mentioned, I'm only actually seeing ',;' in practice.

[...]

> Or even:
> 
>     infile = file('data.csv')
>     dialect = csv.Sniffer().sniff( infile.read(8192) )
>     if dialect:
>         infile.seek(0)
>         for fields in csv.reader( infile, dialect ):
>             # do something with fields
> 
> Doesn't seem too bad.  There really doesn't seem to be a universal
> solution to this.  If you use the sniffer you're forced to rewind.

No big deal indeed, especially once wrapped in generator as I did
in a c.l.py post.

I sure don't want to nitpick, as I believe the CSV module is a very neat
addition to the stdlib. For example, if your excel sheet has multi-line
values, the CSV file ends up holding newlines (or carriage returns, 
sorry I don't recall) _within_ fields. If you open the csv file in text 
mode, there's no way to distinguish (on windows) between those single 
NL's and the CR-NL pairs at the end of lines/records. In such a case, 
you need to open the file as binary, and split explicitly on "\r\n". 
You can wrap it all in a generator, but that gets unwieldy. The seemingly 
simplistic csv modules nicely hides all this.

Thanks again,

Bernard.



From djc at object-craft.com.au  Sat May 17 15:31:47 2003
From: djc at object-craft.com.au (Dave Cole)
Date: 17 May 2003 23:31:47 +1000
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
	module
In-Reply-To: <16069.14112.477663.5928@montanaro.dyndns.org>
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
	<mailman.1053029820.28831.python-list@python.org>
	<3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net>
	<16069.14112.477663.5928@montanaro.dyndns.org>
Message-ID: <m3u1btaed8.fsf@ferret.object-craft.com.au>

>>>>> "Skip" == Skip Montanaro <skip at pobox.com> writes:

>>> I'll leave Dave and Andrew to comment on the possibility of
>>> admitting a multiple-character delimiter string, as that will
>>> affect their C code.

Bernard> Are they monitoring this ng as well, or should I repost
Bernard> elsewhere?  Notice I am not asking for a multichar delimiter
Bernard> but for multiple alternate single-char separators.

Skip> As I mentioned in my original note, the best place for this
Skip> discussion is csv at mail.mojam.com.  I'm sure Dave and Andrew are
Skip> there.  I don't know how regularly they monitor c.l.py.

I usually read c.l.py every day (at least skip over the subjects).
Haven't done it for over a week since our ISP's news server died.
Dunno why they are taking so long to fix it...

Bernard> Also, if this was supported directly in reader(), the
Bernard> file-like argument would not necessarily have to be seekable,
Bernard> it could conceivably just use the first read data chunk for
Bernard> the guess-work as well as for further parsing of the first
Bernard> rows.

One of the suggestions I made early on in the csv development was to
allow the sniffer and reader to operate on iterable data sources.
Turns out that you don't really need the sniffer to use an iterable
for input.

With the following (completely untested) you could sniff and read an
input source while only reading it once.

    class SniffedInput:
        def __init__(self, fp):
            self.fp = fp
            self.sample = []
            self.end_of_input = 0
            for i in range(20):
                line = fp.readline()
                if not line:
                    self.end_of_input = 0
                    break
                sample.append(line)
            self.dialect = csv.Sniffer().sniff(''.join(sample))

        def __iter__(self):
            return self

        def next(self):
            if self.sample:
                line = self.sample[0]
                del self.sample[0]
                return line
            if self.end_of_input:
                raise StopIteration
            line = self.fp.readline()
            if not line:
                raise StopIteration
            return line

    inp = SniffedInput(sys.stdin)
    for rec in csv.reader(inp, dialect=inp.dialect):
        process(rec)

Skip> Not necessarily.  It depends on how the file is accessed.  I
Skip> believe it's treated as an iterator, it which case you wind up
Skip> having to read several records, pass them off to the sniffer,
Skip> set your dialect, reprocess the lines you've already read, then
Skip> process the remaining unread lines in the file.  This would be
Skip> more tedious from C than from Python.

Bernard> I hope this could be deemed a common enough usage to grant
Bernard> inclusion in the standard module.

Does the above satisfy your needs?

Should something like that be placed into the csv module?

- Dave

-- 
http://www.object-craft.com.au


From bdelmee at advalvas.be  Sat May 17 20:50:24 2003
From: bdelmee at advalvas.be (=?iso-8859-1?Q?Bernard_Delm=E9e?=)
Date: Sat, 17 May 2003 20:50:24 +0200
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
	module
References: 
	<3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net><mailman.1053029820.28831.python-list@python.org><3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net><16069.14112.477663.5928@montanaro.dyndns.org>
	<m3u1btaed8.fsf@ferret.object-craft.com.au>
Message-ID: <000801c31ca5$30241b00$6702a8c0@duracuire>

> With the following (completely untested) you could sniff and read an
> input source while only reading it once.
> 
>     class SniffedInput:
>         [implementation omitted]
>
> Does the above satisfy your needs?

It does, thanks Dave (give or take a few typos trivial to fix).
So I now have three working solutions:

(1) let sniffer detect dialect, reset input then iterate
(2) essentially as (1), except wrapped in a generator
(3) your iterator-based suggestion (SniffedInput); with the 
    advantage of not requiring a seek on the file-like data source

I tested them against a file holding 115.000 lines of 56 fields, and
the respective runtimes are: (1) 5.5s (2) 6.5s (3) 6.9s

I think 2&3 add overhead to every readline(), if only an extra 
python function call (iterator/generator), and these accumulate to 
a perceptible -albeit little- slowdown. 

> Should something like that be placed into the csv module?

I dunno, really. Given the above results, the overhead would probably 
only go away if this was supported by the C reader() code, with usage
close to my original suggestion. That's probably too much to ask, 
certainly if I've been the sole user to ask for it.

*now* there's something else Skip got me thinking about (maybe this 
should be a separate post). He rightly underlined that there's no 
guarantee that the sniffer will guess right. For example if most of 
your fields are "dd/mm/yy" dates, the sniffer may decide (untried) 
that '/' is the most likely delimiter. Hence let me re-iterate my 
suggestion to tip the sniffer off by adding a second argument to 
Sniffer().sniff(), an optional string holding the allowed or expected 
delimiters. Short of direct support for mutiple separators, which
may be too rarely needed to move to the C implementation, it would 
be *very* useful to have a means to assist the sniffer in guessing right.

Thanks for your attention,

Bernard.

PS: do I have to subscrive somewhere to follow csv at mail.mojam.com ?



From skip at pobox.com  Sun May 18 00:21:38 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat, 17 May 2003 17:21:38 -0500
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
        module
In-Reply-To: <000801c31ca5$30241b00$6702a8c0@duracuire>
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
        <mailman.1053029820.28831.python-list@python.org>
        <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net>
        <16069.14112.477663.5928@montanaro.dyndns.org>
        <m3u1btaed8.fsf@ferret.object-craft.com.au>
        <000801c31ca5$30241b00$6702a8c0@duracuire>
Message-ID: <16070.46578.53650.639146@montanaro.dyndns.org>


    Bernard> *now* there's something else Skip got me thinking about (maybe
    Bernard> this should be a separate post). He rightly underlined that
    Bernard> there's no guarantee that the sniffer will guess right. For
    Bernard> example if most of your fields are "dd/mm/yy" dates, the
    Bernard> sniffer may decide (untried) that '/' is the most likely
    Bernard> delimiter. Hence let me re-iterate my suggestion to tip the
    Bernard> sniffer off by adding a second argument to Sniffer().sniff(),
    Bernard> an optional string holding the allowed or expected
    Bernard> delimiters. Short of direct support for mutiple separators,
    Bernard> which may be too rarely needed to move to the C implementation,
    Bernard> it would be *very* useful to have a means to assist the sniffer
    Bernard> in guessing right.

Please try the attached context diff.  It seems to work as I interpreted
your request.  Note the new test_delimiters method.  When I first wrote it I
guessed wrong what the sniffer would come up with as an unguided delimiter.
It picked '0' instead of '/' as I thought.  With the delimiters parameter it
correctly picks from the string passed in.

Skip

-------------- next part --------------
A non-text attachment was scrubbed...
Name: csv.diff
Type: application/octet-stream
Size: 5232 bytes
Desc: not available
Url : http://mail.python.org/pipermail/csv/attachments/20030517/c2daffea/attachment.obj 

From skip at pobox.com  Sun May 18 00:22:46 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat, 17 May 2003 17:22:46 -0500
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
        module
In-Reply-To: <000801c31ca5$30241b00$6702a8c0@duracuire>
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
        <mailman.1053029820.28831.python-list@python.org>
        <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net>
        <16069.14112.477663.5928@montanaro.dyndns.org>
        <m3u1btaed8.fsf@ferret.object-craft.com.au>
        <000801c31ca5$30241b00$6702a8c0@duracuire>
Message-ID: <16070.46646.419951.995394@montanaro.dyndns.org>


    Bernard> PS: do I have to subscrive somewhere to follow
    Bernard> csv at mail.mojam.com ?

Yes, if you'd like to not always rely on someone 'cc'ing you, the signup
form is at

    http://manatee.mojam.com/mailman/listinfo/csv

Skip


From bdelmee at advalvas.be  Sun May 18 11:14:26 2003
From: bdelmee at advalvas.be (=?iso-8859-1?Q?Bernard_Delm=E9e?=)
Date: Sun, 18 May 2003 11:14:26 +0200
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
	module
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
	<mailman.1053029820.28831.python-list@python.org>
	<3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net>
	<16069.14112.477663.5928@montanaro.dyndns.org>
	<m3u1btaed8.fsf@ferret.object-craft.com.au>
	<000801c31ca5$30241b00$6702a8c0@duracuire>
	<16070.46578.53650.639146@montanaro.dyndns.org>
Message-ID: <005201c31d1d$ebeed1e0$6702a8c0@duracuire>

Skip Montanaro wrote:
> Please try the attached context diff.  It seems to work as I
> interpreted your request.  Note the new test_delimiters method.  When
> I first wrote it I guessed wrong what the sniffer would come up with
> as an unguided delimiter. It picked '0' instead of '/' as I thought. 
> With the delimiters parameter it correctly picks from the string
> passed in. 

Sorry Skip, it's been a while since I used patch:
I tried: patch -c csv.py csv.diff
but got:  patching file csv.py
    can't find file to patch at input line 118
    Perhaps you should have used the -p or --strip option?
    The text leading up to this was:
    --------------------------
    |Index: Lib/test/test_csv.py
    |===================================================================
    |RCS file: /cvsroot/python/python/dist/src/Lib/test/test_csv.py,v
    |retrieving revision 1.7
    |diff -c -r1.7 test_csv.py
    |*** Lib/test/test_csv.py       6 May 2003 15:56:05 -0000       1.7
    |--- Lib/test/test_csv.py       17 May 2003 22:19:12 -0000
    --------------------------
    File to patch:
    : No such file or directory
    Skip this patch? [y]
    Skipping patch.
    2 out of 2 hunks ignored

Apparently, the patch to csv.py was well appled, not the one to 
test_csv. Looking at the code, it seems to indeed restrict the
returned delimiter to the set of allowed values. And it works
with my previous test, no problem. One thing I didn't understand,
though, is that given input consisting of lines such as
    1/2/3;2/3/4;3/4/5;4/5/6;5/6/7
the sniffer (correctly) returns ';' (not '/') as delimiter, with 
or without additional hint! Same thing if i replace '/' with ':'
On the other hand, using '/' as additional param does force it 
to be picked as delimiter as expected.
Maybe there already was some heuristic weighting 'likely' separators
in the sniffer, after all? Well, checking the implementation, there's
indeed the Sniffer.preferred list of separators which sets sensible
defaults (including ',' and ';' with which I was concerned in the 1st
place). 

So... in the end I think I raised a false alarm, and should have 
checked and tested more after you warned me -fair enough- 
that the sniffer can't always be right.
The new parameter works, but will probably very rarely be needed 
given the reasonable defaults. Someone with a *really* untypical 
input will always be able to explicitly set the delimiter.

So it's up to you to decide whether the additional control level
is worth keeping, or just adds to the confusion. What I'd suggest,
though, is that the documentation for the sniffer should explicily
show the set of separators it favors (',', '\t', ';', ' ', ':').

Sorry for the noise, cheers,

Bernard.



From skip at pobox.com  Sun May 18 13:12:05 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sun, 18 May 2003 06:12:05 -0500
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
        module
In-Reply-To: <005201c31d1d$ebeed1e0$6702a8c0@duracuire>
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
        <mailman.1053029820.28831.python-list@python.org>
        <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net>
        <16069.14112.477663.5928@montanaro.dyndns.org>
        <m3u1btaed8.fsf@ferret.object-craft.com.au>
        <000801c31ca5$30241b00$6702a8c0@duracuire>
        <16070.46578.53650.639146@montanaro.dyndns.org>
        <005201c31d1d$ebeed1e0$6702a8c0@duracuire>
Message-ID: <16071.27269.594139.214571@montanaro.dyndns.org>


    Bernard> Sorry Skip, it's been a while since I used patch:

Cd to the top of your Python source tree and execute

    patch -p 0 < csv.diff

    Bernard> Maybe there already was some heuristic weighting 'likely'
    Bernard> separators in the sniffer, after all? Well, checking the
    Bernard> implementation, there's indeed the Sniffer.preferred list of
    Bernard> separators which sets sensible defaults (including ',' and ';'
    Bernard> with which I was concerned in the 1st place).

There are two _guess functions.  _guess_quote_and_delimiter and
_guess_delimiter.  Here are their doc strings:

        """
        Looks for text enclosed between two identical quotes
        (the probable quotechar) which are preceded and followed
        by the same character (the probable delimiter).
        For example:
                         ,'some text',
        The quote with the most wins, same with the delimiter.
        If there is no quotechar the delimiter can't be determined
        this way.
        """

        """
        The delimiter /should/ occur the same number of times on
        each row. However, due to malformed data, it may not. We don't want
        an all or nothing approach, so we allow for small variations in this
        number.
          1) build a table of the frequency of each character on every line.
          2) build a table of freqencies of this frequency (meta-frequency?),
             e.g.  'x occurred 5 times in 10 rows, 6 times in 1000 rows,
             7 times in 2 rows'
          3) use the mode of the meta-frequency to determine the /expected/
             frequency for that character
          4) find out how often the character actually meets that goal
          5) the character that best meets its goal is the delimiter
        For performance reasons, the data is evaluated in chunks, so it can
        try and evaluate the smallest portion of the data possible, evaluating
        additional chunks as necessary.
        """

First the q_and_d version is called.  If that fails the less restrictive one
is called.

    Bernard> So... in the end I think I raised a false alarm, and should have 
    Bernard> checked and tested more after you warned me -fair enough- 
    Bernard> that the sniffer can't always be right.
    Bernard> The new parameter works, but will probably very rarely be needed 
    Bernard> given the reasonable defaults. Someone with a *really* untypical 
    Bernard> input will always be able to explicitly set the delimiter.

It's probably worth having nonetheless, just because we can construct
"reasonable" CSV files on which it guesses wrong.

    Bernard> So it's up to you to decide whether the additional control
    Bernard> level is worth keeping, or just adds to the confusion. What I'd
    Bernard> suggest, though, is that the documentation for the sniffer
    Bernard> should explicily show the set of separators it favors (',',
    Bernard> '\t', ';', ' ', ':').

I'm not sure it favors any delimiters.  I think it depends on frequency and
regularity.  I don't know the delimiter guessing code well and am
disinclined to guess about what it favors.

Skip



From sjmachin at LEXICON.NET  Mon May 19 01:24:19 2003
From: sjmachin at LEXICON.NET (John Machin)
Date: Mon, 19 May 2003 09:24:19 +1000
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
	module
Message-ID: <200305182323.CQK30392@titan.izone.net.au>

Perhaps the sniffer could have a built-in but over-ridable 
list of characters called 
delimiters_used_in_files_created_by_people_not_totally_out_of_
their_trees
for use as a default.
This would exclude '0' and all other alphanumeric characters, 
and '/-$.'\"`(){}[]\\'.

---- Original message ----
>Date: Sat, 17 May 2003 17:21:38 -0500
>From: Skip Montanaro <skip at pobox.com>  
>Subject: Re: [Csv] Re: [PEP305] Python 2.3: a small change 
request in CSV module  
>To: Bernard Delm?e <bdelmee at easynet.be>
>Cc: csv at mail.mojam.com
>
>
>
>Please try the attached context diff.  It seems to work as I 
interpreted
>your request.  Note the new test_delimiters method.  When I 
first wrote it I
>guessed wrong what the sniffer would come up with as an 
unguided delimiter.
>It picked '0' instead of '/' as I thought.  With the 
delimiters parameter it
>correctly picks from the string passed in.
>
>Skip

From skip at pobox.com  Mon May 19 17:35:30 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon, 19 May 2003 10:35:30 -0500
Subject: [Csv] optional Sniffer.sniff() delimiters arg added
Message-ID: <16072.63938.547985.870948@montanaro.dyndns.org>

I just checked in a change to csv.Sniffer.sniff() which adds an optional
delimiters arg.  It is a string which limits the characters which will be
considered as possible field delimiters.

Skip


From LogiplexSoftware at earthlink.net  Mon May 19 10:11:20 2003
From: LogiplexSoftware at earthlink.net (Cliff Wells)
Date: 19 May 2003 01:11:20 -0700
Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV
	module
In-Reply-To: <16071.27269.594139.214571@montanaro.dyndns.org>
References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net>
	 <mailman.1053029820.28831.python-list@python.org>
	 <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net>
	 <16069.14112.477663.5928@montanaro.dyndns.org>
	 <m3u1btaed8.fsf@ferret.object-craft.com.au>
	 <000801c31ca5$30241b00$6702a8c0@duracuire>
	 <16070.46578.53650.639146@montanaro.dyndns.org>
	 <005201c31d1d$ebeed1e0$6702a8c0@duracuire>
	 <16071.27269.594139.214571@montanaro.dyndns.org>
Message-ID: <1053331880.1449.123.camel@software1.logiplex.internal>

On Sun, 2003-05-18 at 04:12, Skip Montanaro wrote:

>     Bernard> So it's up to you to decide whether the additional control
>     Bernard> level is worth keeping, or just adds to the confusion. What I'd
>     Bernard> suggest, though, is that the documentation for the sniffer
>     Bernard> should explicily show the set of separators it favors (',',
>     Bernard> '\t', ';', ' ', ':').
> 
> I'm not sure it favors any delimiters.  I think it depends on frequency and
> regularity.  I don't know the delimiter guessing code well and am
> disinclined to guess about what it favors.

Bernard is correct.  If the sniffer comes up with two equally likely
candidates, it falls back to a preferred list (if one of the two
candidates occurs higher in the list then it is deemed to be the
delimiter).   I'm not fond of this (and I *think* there actually may be
a way to solve this problem algorithmically) but it seems to work in
practical use.

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308


From Andreas.Trawoeger at wgkk.sozvers.at  Wed May 21 14:50:43 2003
From: Andreas.Trawoeger at wgkk.sozvers.at (Andreas.Trawoeger at wgkk.sozvers.at)
Date: Wed, 21 May 2003 14:50:43 +0200
Subject: [Csv] Problems with CSV Module
Message-ID: <OF4B3BEE8E.AA12CA36-ONC1256D2D.003B294C-C1256D2D.004689C2@wgkk.sozvers.at>

Hi!

I am testing Python 2.3b1and have found a couple of problems with the CSV
Module:

1. Documentation:
What's a row? (The word row means a list or a tuple.)
How does DictReader & DictWriter work? Having a couple of examples would
help ;-))

2. Locale:
The CSV module doesn't use locale. The default delimiter for Austria
(+Germany) in Windows is a semicolon ';' not a comma ','.
Having the result, that you can't import a list generated by csv.writer()
in Excel without changing your regional settings, or using
csv.writer(delimiter=';').
It would be nice if the CSV module would adopt to the language settings.

This could be really simple to implement using the locale module. But I
took a short look at the locale module and it seems like there is no way to
get the list separator sign (probably it's not POSIX complaint).

Another possibility would be to have a dialect like 'excel_ger' with the
correct settings.

3. There is no .close()
There is no way to close a file. Resulting in problems with file locking.
Only way around is to do it by hand:

import csv
FILE_CSV    = r"C:\csvtest.csv"

f=file(FILE_CSV,'w')
w=csv.writer(f,dialect='excel',delimiter=';')
w.writerow((1,5,10,25,100,250,500,1000,1500))
f.close()

f=file(FILE_CSV,'r')
r=csv.reader(file(FILE_CSV,'r'),dialect='excel',delimiter=';')
print r.next()
f.close()

4. There is no .readrow()
This should be just another name for .next(). It's more intuitive if you
write a row via .writerow() and read it via .readrow().


Mit freundlichen Gr?ssen
Andreas Traw?ger

Netzwerk  /  Systemadministration
Wiener Gebietskrankenkasse
Tel.:  +43(1) 60122-3664
Fax.: +43(1) 60122-2182



From skip at pobox.com  Wed May 21 16:28:29 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed, 21 May 2003 09:28:29 -0500
Subject: [Csv] Problems with CSV Module
In-Reply-To: <OF4B3BEE8E.AA12CA36-ONC1256D2D.003B294C-C1256D2D.004689C2@wgkk.sozvers.at>
References: <OF4B3BEE8E.AA12CA36-ONC1256D2D.003B294C-C1256D2D.004689C2@wgkk.sozvers.at>
Message-ID: <16075.36109.559279.602298@montanaro.dyndns.org>


    Andreas> 1. Documentation:
    Andreas> What's a row? (The word row means a list or a tuple.)
    Andreas> How does DictReader & DictWriter work? Having a couple of examples would
    Andreas> help ;-))

Thanks, I'll add a couple examples and better define row.  DictReader works
pretty much like dict cursors in the various Python database packages,
returning a dictionary instead of a tuple for each row of data.  Here's an
example of using csv.DictReader.  This particular snippet parses CSV files
dumped by Checkpoint Software's Firewall-1 product.

    class fw1dialect(csv.Dialect):
        lineterminator = '\n'
        escapechar = '\\'
        skipinitialspace = False
        quotechar = '"'
        quoting = csv.QUOTE_ALL
        delimiter = ';'
        doublequote = True

    csv.register_dialect("fw1", fw1dialect)

    fieldnames = ("num;date;time;orig;type;action;alert;i/f_name;"
                  "i/f_dir;product;src;s_port;dst;service;proto;"
                  "rule;th_flags;message_info;icmp-type;icmp-code;"
                  "sys_msgs;cp_message;sys_message").split(';')
    rdr = csv.DictReader(f, fieldnames=fieldnames, dialect="fw1")

    for row in rdr:
        if row["num"] is None:
            continue
        nrows += 1
        if action is not None and  row["action"] != action:
            continue
        source = row.get("src", "unknown")
        ...

Note that instead of returning a tuple for each row, a dictionary is
returned.  Its keys are the elements of the fieldnames parameter of the
constructor. 

    Andreas> 2. Locale:
    Andreas> The CSV module doesn't use locale. The default delimiter for Austria
    Andreas> (+Germany) in Windows is a semicolon ';' not a comma ','.
    Andreas> Having the result, that you can't import a list generated by csv.writer()
    Andreas> in Excel without changing your regional settings, or using
    Andreas> csv.writer(delimiter=';').
    Andreas> It would be nice if the CSV module would adopt to the language settings.

How can I get that from Python or do I have to know that if the locale is de
the default Excel delimiter is a semicolon?  What other locales have a
semicolon as the default?  I suspect if we have to enumerate them all it may
not get done?  Also, note that the 

    Andreas> This could be really simple to implement using the locale
    Andreas> module. But I took a short look at the locale module and it
    Andreas> seems like there is no way to get the list separator sign
    Andreas> (probably it's not POSIX complaint).

That would make it difficult to do.

    Andreas> Another possibility would be to have a dialect like 'excel_ger'
    Andreas> with the correct settings.

But what about all the other locales which must use a semicolon as the
default delimiter?

How about this in your code:

    class excel(csv.excel):
        delimiter = ';'
    csv.register_dialect("excel", excel)

    Andreas> 3. There is no .close()

Note that the "file-like object" can be any object which supports the
iterator protocol, so it need not have a close() method.  In the test code
we often use lists, e.g.:

    def test_read_with_blanks(self):
        reader = csv.DictReader(["1,2,abc,4,5,6\r\n","\r\n",
                                 "1,2,abc,4,5,6\r\n"],
                                fieldnames="1 2 3 4 5 6".split())
        self.assertEqual(reader.next(), {"1": '1', "2": '2', "3": 'abc',
                                         "4": '4', "5": '5', "6": '6'})
        self.assertEqual(reader.next(), {"1": '1', "2": '2', "3": 'abc',
                                         "4": '4', "5": '5', "6": '6'})

    Andreas> f=file(FILE_CSV,'w')
    Andreas> w=csv.writer(f,dialect='excel',delimiter=';')
    Andreas> w.writerow((1,5,10,25,100,250,500,1000,1500))
    Andreas> f.close()

    Andreas> f=file(FILE_CSV,'r')
    Andreas> r=csv.reader(file(FILE_CSV,'r'),dialect='excel',delimiter=';')
    Andreas> print r.next()
    Andreas> f.close()

Yes, this is what you'll have to do, though note that if you reuse f the
first call to f.close() is unnecessary.

    Andreas> 4. There is no .readrow()

    Andreas> This should be just another name for .next(). It's more
    Andreas> intuitive if you write a row via .writerow() and read it via
    Andreas> .readrow().

I think we can probably squeeze this in.

Skip

From neal at metaslash.com  Thu May 22 19:12:48 2003
From: neal at metaslash.com (Neal Norwitz)
Date: Thu, 22 May 2003 17:12:48 -0000
Subject: [Csv] memory leaks
Message-ID: <20030522170709.GW26970@epoch.metaslash.com>

Included is a patch which corrects memory leaks in the CSV module.
The patch was produced from the current version in Python CVS.

I'm not sure if all of these are correct, but the patch corrects the
leaks reported by valgrind.

Neal
--

Index: Modules/_csv.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/_csv.c,v
retrieving revision 1.11
diff -w -u -r1.11 _csv.c
--- Modules/_csv.c      14 Apr 2003 02:20:55 -0000      1.11
+++ Modules/_csv.c      22 May 2003 17:03:34 -0000
@@ -465,6 +465,8 @@
 {
        if (self->field_size == 0) {
                self->field_size = 4096;
+               if (self->field != NULL)
+                       PyMem_Free(self->field);
                self->field = PyMem_Malloc(self->field_size);
        }
        else {
@@ -739,6 +741,8 @@
         Py_XDECREF(self->dialect);
         Py_XDECREF(self->input_iter);
         Py_XDECREF(self->fields);
+        if (self->field != NULL)
+               PyMem_Free(self->field);
        PyObject_GC_Del(self);
 }
  
@@ -1002,6 +1006,8 @@
        if (rec_len > self->rec_size) {
                if (self->rec_size == 0) {
                        self->rec_size = (rec_len / MEM_INCR + 1) * MEM_INCR;
+                       if (self->rec != NULL)
+                               PyMem_Free(self->rec);
                        self->rec = PyMem_Malloc(self->rec_size);
                }
                else {
@@ -1191,6 +1197,8 @@
 {
         Py_XDECREF(self->dialect);
         Py_XDECREF(self->writeline);
+       if (self->rec != NULL)
+               PyMem_Free(self->rec);
        PyObject_GC_Del(self);
 }