I am trying to delete duplicates but the job just finishes with an exit code 0
Peter Otten
__peter__ at web.de
Tue Nov 7 08:17:53 EST 2017
tysondogerz at gmail.com wrote:
> I am trying to delete duplicates but the job just finishes with an exit
> code 0 and does not delete any duplicates.
>
> The duplicates for the data always exist in Column F and I am desiring to
> delete the entire row B-I
>
> Any ideas?
>
>
> import openpyxl
> wb1 = openpyxl.load_workbook('C:/dwad/SWWA.xlsx')
> ws1 = wb1.active # keep naming convention consistent
>
> values = []
> for i in range(2,ws1.max_row+1):
> if ws1.cell(row=i,column=1).value in values:
> #pass
> #else:
> values.append(ws1.cell(row=i,column=1).value)
>
> for value in values:
> ws1.append([value])
append() will add even more duplicates to the sheet. If you do not care
about cell styles you can create a new sheet and copy only unique values. A
complete example:
import openpyxl
SOURCE_FILE = "duplicates.xlsx"
DEST_FILE = "unique.xlsx"
HEADER_COUNT = 1
KEY_COLUMNS = [1] # zero-based A=0, B=1, ...
workbook = openpyxl.load_workbook(SOURCE_FILE)
source_sheet = workbook.active
dest_sheet = workbook.create_sheet()
seen = set()
for i, row in enumerate(source_sheet.values):
if i < HEADER_COUNT:
dest_sheet.append(row)
else:
key = tuple(row[i] for i in KEY_COLUMNS)
print("row = %r, key = %r" %(row, key))
if key not in seen:
print("adding row", row)
seen.add(key)
dest_sheet.append(row)
workbook.save(DEST_FILE)
> I have attempted to do this with openpyxl for an excel as well as other
> methods (including csv though this deleted rows excessively).
I find that hard to believe. If anything it should keep more rows as you
compare whole lines, not just the columns you are interested in.
> CSV:
> with open('1.csv','r') as in_file, open('2.csv','w') as out_file:
> seen = set() # set for fast O(1) amortized lookup
> for line in in_file:
> if line not in seen:
> seen.add(line)
> out_file.write(line)
General remark: use the csv module in the standard library rather than
trying to parse the records manually.
More information about the Python-list
mailing list