[Tutor] Re: Splitting up the input

Derrick 'dman' Hudson dman@dman.ddts.net
Thu Jan 23 22:41:48 2003


--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jan 23, 2003 at 08:11:58AM -0800, Danny Yoo wrote:
| On Thu, 23 Jan 2003, Derrick 'dman' Hudson wrote:
| > On Thu, Jan 23, 2003 at 02:22:15PM -0000, Deirdre Hackett wrote: | I
| > | want to read in infromation from the serial port. The probem is i=20
| > | can take i the whole line but only want to read it in in lumps of
| > | 9 characters.
| >
| > | As in, I want to read the first 9 characters and put that into X,
| > | then read in the next 9 characters and put that into Y...
| >
| > There are a couple of ways to do that with varying tradeoffs :
| >
| > (assume 'f' is a file object referring to the serial port; if you use
| > windows I can't help with getting that object)
| >
| > line =3D f.readline()
| > X =3D line[:9]
| > Y =3D line[9:18]

| It might be a better idea to avoid readline() on a "binary" stream, since
| there's a chance that there won't be any newlines in the file.

I agree here.  It all depends on the data itself.  On my last co-op I
had to write a daemon to read SMDR log info from the serial port.
SMDR is the log format the Lucent phone switch generated.  It was a
plain-text format suitable to feed directly into a line printer (80
character lines, new header every 66 lines).  Instead of spewing pages
of hard-copy logs on the floor, I wrote a daemon to read the data
(using readline() since it was ASCII) and split it up into the columns
for insertion into a database.  That text happened to fall into a strict
columnar layout, so "extract 9 characters, then another 9 characters"
can be sensible even for text data.  If you are reading a binary data
stream, however, I agree that using read() directly is better.

| > This will fail (with an IndexError) if the line is too short.
|=20
| No, the slicing above should not fail.  For example,

Oops, my bad.  I was assuming (without testing) that it would fail the
same as indexing does.  The non-failure might really be a failure in
disguise if you don't ever want to get the empty string.  Never forget
to over-check the validity of the input!

| This is because something like 'line[20:100]' asks Python: "Give me a
| string containing all the characters between indices 20 and 100".  But
| there are no characters that span, so we get back the empty string.
| (It's analogous to the "vacuously true" sort of situation that
| mathematicians run into every so often.)

It's reasonable behavior, IMO, and in many cases is probably much
nicer than an exception. =20
-D

--=20
Do not pay attention to every word people say,
    or you may hear your servant cursing you --
for you know in your heart
    that many times you yourself have cursed others.
        Ecclesiastes 7:21-22
=20
http://dman.ddts.net/~dman/

--u3/rZRmxL6MmkK24
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAj4wtZQACgkQO8l8XBKTpRRz7gCfb7aomcv77wVoNgz4STNttnE3
jgYAnA3a4I7DvYRBYSIh7vPo0MyBCkgz
=gxLN
-----END PGP SIGNATURE-----

--u3/rZRmxL6MmkK24--