shutil.copyfileobj to return number of bytes copied

Hi, Currently shutil.copyfileobj returns nothing. I would like to be able to find out how many bytes were copied. Whilst most file-like objects have a .tell() which you could use, some don’t, and .tell() is not guaranteed to measure the number of bytes, it could measure with other units. I don’t think changing the return value from None to not-None would be backwards incompatible. We could also do the same change for all other shutil.copy* methods (except copytree). Looking at the code, this seems straightforward to implement. Existing code: ``` def copyfileobj(fsrc, fdst, length=0): """copy data from file-like object fsrc to file-like object fdst""" # Localize variable access to minimize overhead. if not length: length = COPY_BUFSIZE fsrc_read = fsrc.read fdst_write = fdst.write while True: buf = fsrc_read(length) if not buf: break fdst_write(buf) ``` New code: ``` def copyfileobj(fsrc, fdst, length=0): """copy data from file-like object fsrc to file-like object fdst""" # Localize variable access to minimize overhead. if not length: length = COPY_BUFSIZE fsrc_read = fsrc.read fdst_write = fdst.write bytes_copied = 0 while True: buf = fsrc_read(length) if not buf: break fdst_write(buf) bytes_copied += len(buf) return bytes_copied ``` Regards, Matt Matthew Davis Data Analytics Senior Lead Telstra Energy – Decision Science & AI E: Matthew.Davis.2@team.telstra.com<mailto:Matthew.Davis.2@team.telstra.com> | M: 0415762868

Hi Barry, I can’t use os.fstat. That only applies to real files, not file-like objects. I’m using streams for downloading, unzipping, gzipping and uploading data which is larger than my disk and memory. So it’s all file-like objects, not real files. A file-like object that just passes through data to another file-like object and counts it would be a feasible solution. Thanks. Regards, Matt From: Barry <barry@barrys-emacs.org> Date: Thursday, 3 March 2022 at 2:48 am To: Davis, Matthew <Matthew.Davis.2@team.telstra.com> Cc: python-ideas@python.org <python-ideas@python.org> Subject: Re: [Python-ideas] shutil.copyfileobj to return number of bytes copied You don't often get email from barry@barrys-emacs.org. Learn why this is important<http://aka.ms/LearnAboutSenderIdentification> [External Email] This email was sent from outside the organisation – be cautious, particularly with links and attachments. On 2 Mar 2022, at 13:40, Davis, Matthew via Python-ideas <python-ideas@python.org> wrote: Hi, Currently shutil.copyfileobj returns nothing. I would like to be able to find out how many bytes were copied. Whilst most file-like objects have a .tell() which you could use, some don’t, and .tell() is not guaranteed to measure the number of bytes, it could measure with other units. Can you use os.fstat() on one of the file objects? Can you write a helper that gets the size from the object that do not have tell()? Barry I don’t think changing the return value from None to not-None would be backwards incompatible. We could also do the same change for all other shutil.copy* methods (except copytree). Looking at the code, this seems straightforward to implement. Existing code: ``` def copyfileobj(fsrc, fdst, length=0): """copy data from file-like object fsrc to file-like object fdst""" # Localize variable access to minimize overhead. if not length: length = COPY_BUFSIZE fsrc_read = fsrc.read fdst_write = fdst.write while True: buf = fsrc_read(length) if not buf: break fdst_write(buf) ``` New code: ``` def copyfileobj(fsrc, fdst, length=0): """copy data from file-like object fsrc to file-like object fdst""" # Localize variable access to minimize overhead. if not length: length = COPY_BUFSIZE fsrc_read = fsrc.read fdst_write = fdst.write bytes_copied = 0 while True: buf = fsrc_read(length) if not buf: break fdst_write(buf) bytes_copied += len(buf) return bytes_copied ``` Regards, Matt Matthew Davis Data Analytics Senior Lead Telstra Energy – Decision Science & AI E: Matthew.Davis.2@team.telstra.com<mailto:Matthew.Davis.2@team.telstra.com> | M: 0415762868 _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZKVI4D... Code of Conduct: http://python.org/psf/codeofconduct/

On Wed, Mar 02, 2022 at 01:08:41AM +0000, Davis, Matthew via Python-ideas wrote:
That seems reasonable to me. It would be a similar change to having file.write() return the number of bytes or characters written. You could put an enhancement request on the bug tracker. It would help if you can also describe the reason why you want this, and link back to this discussion. https://bugs.python.org/ -- Steve

Hi Barry, I can’t use os.fstat. That only applies to real files, not file-like objects. I’m using streams for downloading, unzipping, gzipping and uploading data which is larger than my disk and memory. So it’s all file-like objects, not real files. A file-like object that just passes through data to another file-like object and counts it would be a feasible solution. Thanks. Regards, Matt From: Barry <barry@barrys-emacs.org> Date: Thursday, 3 March 2022 at 2:48 am To: Davis, Matthew <Matthew.Davis.2@team.telstra.com> Cc: python-ideas@python.org <python-ideas@python.org> Subject: Re: [Python-ideas] shutil.copyfileobj to return number of bytes copied You don't often get email from barry@barrys-emacs.org. Learn why this is important<http://aka.ms/LearnAboutSenderIdentification> [External Email] This email was sent from outside the organisation – be cautious, particularly with links and attachments. On 2 Mar 2022, at 13:40, Davis, Matthew via Python-ideas <python-ideas@python.org> wrote: Hi, Currently shutil.copyfileobj returns nothing. I would like to be able to find out how many bytes were copied. Whilst most file-like objects have a .tell() which you could use, some don’t, and .tell() is not guaranteed to measure the number of bytes, it could measure with other units. Can you use os.fstat() on one of the file objects? Can you write a helper that gets the size from the object that do not have tell()? Barry I don’t think changing the return value from None to not-None would be backwards incompatible. We could also do the same change for all other shutil.copy* methods (except copytree). Looking at the code, this seems straightforward to implement. Existing code: ``` def copyfileobj(fsrc, fdst, length=0): """copy data from file-like object fsrc to file-like object fdst""" # Localize variable access to minimize overhead. if not length: length = COPY_BUFSIZE fsrc_read = fsrc.read fdst_write = fdst.write while True: buf = fsrc_read(length) if not buf: break fdst_write(buf) ``` New code: ``` def copyfileobj(fsrc, fdst, length=0): """copy data from file-like object fsrc to file-like object fdst""" # Localize variable access to minimize overhead. if not length: length = COPY_BUFSIZE fsrc_read = fsrc.read fdst_write = fdst.write bytes_copied = 0 while True: buf = fsrc_read(length) if not buf: break fdst_write(buf) bytes_copied += len(buf) return bytes_copied ``` Regards, Matt Matthew Davis Data Analytics Senior Lead Telstra Energy – Decision Science & AI E: Matthew.Davis.2@team.telstra.com<mailto:Matthew.Davis.2@team.telstra.com> | M: 0415762868 _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZKVI4D... Code of Conduct: http://python.org/psf/codeofconduct/

On Wed, Mar 02, 2022 at 01:08:41AM +0000, Davis, Matthew via Python-ideas wrote:
That seems reasonable to me. It would be a similar change to having file.write() return the number of bytes or characters written. You could put an enhancement request on the bug tracker. It would help if you can also describe the reason why you want this, and link back to this discussion. https://bugs.python.org/ -- Steve
participants (3)
-
Barry
-
Davis, Matthew
-
Steven D'Aprano