I have a csv file of orders from a business I run the web shop for, around 30 thousand entries total. I would like to FIND all entries which have a duplicate in the base: same customer, same total due amount and placed on the same day (some customers make repeating orders but over longer timeframe).
I found a help article regarding removing duplicate values, but it doesn’t really apply to my situation, because I want to remove UNIQUE values.
How would I go about that?
Add a helper column concatenating cell contents (in the following example of column A and B, multiple cells or ranges are possible, see TEXTJOIN() online help) with a delimiter that does not occur in data, here the
|
vertical line or pipe symbol:C1:
=TEXTJOIN("|";0;A1:B1)
Copy-paste or pull down formula C1 to the desired range of rows.
For each row count the number of occurrences, assuming data in rows 1:99 here:
D1:
=COUNTIF(C$1:C$99;C1)
Copy-paste or pull down formula D1 to the desired range of rows.
Filter on column D >1, or sort data in columns A:D on column D descending and at the bottom delete rows with value 1 in column D.