[Tutor] using tarfile on strings or filelike objects

Barton David David.Barton at nottingham.ac.uk
Mon Mar 5 15:06:28 CET 2007


Thanks Kent,

I think I've hacked my way around this but it's a little weird.
Simplest to demonstrate with code (assuming cStringIO and tarfile are
imported and my_tarfile_string & archive_member_name are specified):
__

#approach 1
filelike=cStringIO.StringIO(my_tarfile_string)
tar = tarfile.open(mode="r|bz2", fileobj=filelike)

for tarinfo in tar:
	if tarinfo.name==archive_member_name:
		tfl=tar.extractfile(tarinfo)
		print tfl.read()
		tfl.close()

tar.close()
filelike.close()

#This works. Hooray.
#However, if mode=="r" the following error occurs: "AttributeError:
'NoneType' object has no attribute 'rfind'"
#and if mode=="r|" the following error occurs: "ReadError: empty,
unreadable or compressed file"
#and if mode=="r|*" the following error occurs: "AttributeError: _Stream
instance has no attribute 'dbuf'"
#I only understand the second of those error messages, to be honest.

#approach 2
filelike=cStringIO.StringIO(my_tarfile_string)
tar = tarfile.open(mode="r|bz2", fileobj=filelike)

tfl=tar.extractfile(archive_member_name)
print tfl.read()
tfl.close()

tar.close()
filelike.close()

#approach 3
filelike=cStringIO.StringIO(my_tarfile_string)
tar = tarfile.open(mode="r|bz2", fileobj=filelike)

tarinfo=tar.getmember(archive_member_name)
tfl=tar.extractfile(tarinfo)
print tfl.read()
tfl.close()

tar.close()
filelike.close()

#These DON'T work, and produce the following error: "StreamError:
seeking backwards is not allowed"
#Maybe this is just some wacky bug?


__
At any rate, in the 'slightly related' thread you linked to, the
statement that..
>"bzip2 compressed files cannot be read from a "fake" (StringIO) file
object, only from real files"
..doesn't seem to be the case after all, so that's something. 

Regards
Dave

P.S. I'm very sorry but whenever I post replies to Tutor (I'm using
Outlook and hitting Reply All in response to a tutor's direct reply) it
doesn't seem to add my message to the bottom of the existing thread. If
somebody can tell me what I'm doing wrong, I'd appreciate it.



-----Original Message-----
From: Kent Johnson [mailto:kent37 at tds.net] 
Sent: 05 March 2007 12:44
To: Barton David
Cc: tutor at python.org
Subject: Re: [Tutor] using tarfile on strings or filelike objects

Barton David wrote:
> Thanks Kent,
> 
> But.. I've actually found that that doesn't work (raises a ReadError),

> and neither does..
>>>> tf=tarfile.open(mode="r|*",fileobj=filelike)
> ..which raises "AttributeError: _Stream instance has no attribute 
> 'dbuf'"
> 
> However if I explicitly state the compression type.. e.g.
>>>> tf=tarfile.open(mode="r|bz2",fileobj=filelike)
> ..then I can indeed go on to..
>>>> print tf.getnames()
>>>> assert archive_member_name in tf.getnames()
> ..and it works ok. Having to explicitly state the compression type 
> isn't exactly ideal, but I guess it'll do me for now.
> 
> Unfortunately, I'm still having trouble actually reading the contents 
> of 'archive_member'.
> i.e. ..
>>>> tf_filelike=tf.extractfile(archive_member_name)
>>>> print tf_filelike.read()
> ..raises..
> File "C:\Python24\lib\tarfile.py", line 551, in _readnormal
>     self.fileobj.seek(self.offset + self.pos) File 
> "C:\Python24\lib\tarfile.py", line 420, in seek
>     raise StreamError, "seeking backwards is not allowed"

The docs for the | options say that seeking won't work with them. Maybe
try just 'r'? A hacky workaround might be to open the file once to get
the names out and a second time to read the data. Since the file is in
memory that should be pretty quick.
> 
> And I get the same error if I instead try..
>>>> tf_infoobject=tf.getmember(archive_member_name)
>>>> tf_filelike=tf.extractfile(tf_infoobject)
>>>> print tf_filelike.read()
> 
> In fact I'm getting this even if I open the archive by passing the 
> path name (rather than using fileobj) so I guess this isn't the 
> problem I initially thought it was. I just don't get it.

What if you try exactly the code shown in the tarfile examples for
reading from stdin, with your StringIO file substituted for stdin?

You might want to ask about this on comp.lang.python as no one here
seems to know the answer. This thread is slightly related:
http://tinyurl.com/2hrffs

Kent

> 
> Regards,
> Dave

This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.



More information about the Tutor mailing list