[Tutor] Finding duplicates entry in file

Steven D'Aprano steve at pearwood.info
Sat Mar 20 17:52:14 CET 2010


On Sun, 21 Mar 2010 03:34:01 am Ken G. wrote:
> What is a method I can use to find duplicated entry within a sorted
> numeric file?
>
> I was trying to read a file reading two lines at once but apparently,
> I can only read one line at a time.

f = open("myfile")
while True:
    first = f.readline()  # Read one line.
    second = f.readline()  # And a second.
    process(first)
    process(second)
    if second == '':
        # If the line is empty, that means we've passed the
        # end of the file and we can stop reading.
        break
f.close()


Or if the file is small (say, less than a few tens of megabytes) you can 
read it all at once into a list:

lines = open("myfile").readlines()


> Can the same file be opened and read two times within a program?

You can do this:

text1 = open("myfile").read()
text2 = open("myfile").read()

but why bother? That's just pointlessly wasteful. Better to do this:

text1 = text2 = open("myfile").read()

which is no longer wasteful, but probably still pointless. (Why do I 
need two names for the same text?)


> For example, a file has:
>
> 1
> 2
> 2
> 3
> 4
> 4
> 5
> 6
> 6
>
> The newly revised file should be:
>
> 1
> 2
> 3
> 4
> 5
> 6

Unless the file is huge, something like this should do:

# Untested
lines = open("myfile").readlines()
f = open("myfile", "w")
previous_line = None
for line in lines:
    if line != previous_line:
        f.write(line)
        previous_line = line
f.close()



-- 
Steven D'Aprano


More information about the Tutor mailing list