I have a csv file of orders from a business I run the web shop for, around 30 thousand entries total. I would like to FIND all entries which have a duplicate in the base: same customer, same total due amount and placed on the same day (some customers make repeating orders but over longer timeframe).
I found a help article regarding removing duplicate values, but it doesn’t really apply to my situation, because I want to remove UNIQUE values.
How would I go about that?
Add a helper column concatenating cell contents (in the following example of column A and B, multiple cells or ranges are possible, see TEXTJOIN() online help) with a delimiter that does not occur in data, here the
|
vertical line or pipe symbol:C1:
=TEXTJOIN("|";0;A1:B1)
Copy-paste or pull down formula C1 to the desired range of rows.
For each row count the number of occurrences, assuming data in rows 1:99 here:
D1:
=COUNTIF(C$1:C$99;C1)
Copy-paste or pull down formula D1 to the desired range of rows.
Filter on column D >1, or sort data in columns A:D on column D descending and at the bottom delete rows with value 1 in column D.
Maybe you could concatenate every relevant field/criteria with “ - “ between each element into a new column . then add a filter on the new column for duplicates?
I do this quite often
what if I sorted the orders by names and then for each one check if the one above and the one below it have the same name, date and amount due using 3 columns of IFs, and then filter out those which meet all three of these criteria by multiplying the outputs of IFs in another column? that should work I think? the only problem is last step filtering may fuck up the existing IF functions
That would work for a 1 dimensional table. If you have many columns, you either mess up the following columns or you get back to the beginning. With lists that works wonderfully
It seems to have worked