Skip to content Skip to sidebar Skip to footer

Need To Remove Duplicates From A Nested List Preserving The Order

I have a list like below L = [[1,2],[1,3],[1,4],[2,1],[2,5],[3,1],[3,2]] The output should be [[1,2],[1,3],[1,4],[2,5],[3,2]] Please note that the order for the pairs and the ele

Solution 1:

The unique_everseen function in the itertools recipes in the docs does exactly what you need:

>>> lst = [[1,2],[1,3],[1,4],[2,1],[2,5],[3,1],[3,2]]
>>> list(unique_everseen(lst, key=frozenset))
[[1, 2], [1, 3], [1, 4], [2, 5], [3, 2]]

The basic idea is that it keeps a set of all values seen so far, and skips any value that's already in the set.

The key function works the same way as in sort, max, etc., as explained in the Sorting HOWTO. You want to make two lists that have the same values in a different order match, so we need to compare the set of each list's values, instead of the lists themselves. (The reason we need frozenset instead of set is that set is mutable, and therefore can't be stored in a set.)


If you had more than 2 elements in your sublists, the question would be ambiguous. If you have, say, [1, 1, 2] and [1, 2, 2], do you want them to be considered duplicates, or not?

  • If yes: Then you're treating them a set, so use key=frozenset.
  • If no: Then you're treating them as a multiset. The nicest Python implementation of multisets is collections.Counter, but there's no FrozenCounter (and building one just for this purpose is probably overkill). You can simulate one in a few ways:
    • key=lambda sublist: frozenset(Counter(sublist).items())
    • key=lambda sublist: sorted(Counter(sublist).items())
    • key=lambda sublist: tuple(sorted(sublist))

Since your initial thought was to sort the sublists—which was only unacceptable because you need to end up with the original value, not the sorted value—I think the last of these options is most likely to be the one you'd want, but that's still really just a guess.


You can just copy and paste the function from the docs into your code:

from itertools import *

defunique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."# unique_everseen('AAAABBBCCDAABBB') --> A B C D# unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key isNone:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k notin seen:
                seen_add(k)
                yield element

… or install the third-party library more_itertools and use its unique_everseen it from there. Or the different third-party library toolz has an equivalent function named unique.

Solution 2:

This works similar to removing duplicates from a regular list whilst preserving order, but we need to do something about the sublists not being hashable and the order of the sublists being irrelevant.

We can use frozen sets to solve both these issues at once.

>>>lst = [[1,2],[1,3],[1,4],[2,1],[2,5],[3,1],[3,2]] >>>seen = set()>>>result = []>>>>>>for x in lst:...    s = frozenset(x)...if s notin seen:...        result.append(x)...        seen.add(s)...>>>result
[[1, 2], [1, 3], [1, 4], [2, 5], [3, 2]]

Post a Comment for "Need To Remove Duplicates From A Nested List Preserving The Order"