[Tutor] Finding duplicates entry in file
Steven D'Aprano
steve at pearwood.info
Sat Mar 20 17:52:14 CET 2010
On Sun, 21 Mar 2010 03:34:01 am Ken G. wrote:
> What is a method I can use to find duplicated entry within a sorted
> numeric file?
>
> I was trying to read a file reading two lines at once but apparently,
> I can only read one line at a time.
f = open("myfile")
while True:
first = f.readline() # Read one line.
second = f.readline() # And a second.
process(first)
process(second)
if second == '':
# If the line is empty, that means we've passed the
# end of the file and we can stop reading.
break
f.close()
Or if the file is small (say, less than a few tens of megabytes) you can
read it all at once into a list:
lines = open("myfile").readlines()
> Can the same file be opened and read two times within a program?
You can do this:
text1 = open("myfile").read()
text2 = open("myfile").read()
but why bother? That's just pointlessly wasteful. Better to do this:
text1 = text2 = open("myfile").read()
which is no longer wasteful, but probably still pointless. (Why do I
need two names for the same text?)
> For example, a file has:
>
> 1
> 2
> 2
> 3
> 4
> 4
> 5
> 6
> 6
>
> The newly revised file should be:
>
> 1
> 2
> 3
> 4
> 5
> 6
Unless the file is huge, something like this should do:
# Untested
lines = open("myfile").readlines()
f = open("myfile", "w")
previous_line = None
for line in lines:
if line != previous_line:
f.write(line)
previous_line = line
f.close()
--
Steven D'Aprano
More information about the Tutor
mailing list