Skip to content Skip to sidebar Skip to footer

Finding Gappy Sublists Within A Larger List

Let's say I have a list like this: [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'lawer'], ['she', 'is', 'a', 'great', 'student'], ['i', 'am', 'a', 'teacher'], ['she', '

Solution 1:

If all that you care about is that the words appear in order somehwere in the array, you can use a collections.deque and popleft to iterate through the list, and if the deque is emptied, you have found a valid match:

from collections import deque

deffind_gappy(arr, m):
  dq = deque(m)
  for word in arr:
    if word == dq[0]:
      dq.popleft()
      ifnot dq:
        returnTruereturnFalse

By comparing each word in arr with the first element of dq, we know that when we find a match, it has been found in the correct order, and then we popleft, so we now are comparing with the next element in the deque.

To filter your initial list, you can use a simple list comprehension that filters based on the result of find_gappy:

matches= ['she', 'is', 'student']
x = [i for i in x if find_gappy(i, matches)]

# [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student'], ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]

Solution 2:

You can compare two lists, with a function like this one. The way it works is it loops through your shorter list, and every time it finds the next word in the long list, cuts off the first part of the longer list at that point. If it can't find the word it returns false.

defis_sub_sequence(long_list, short_list):
    for word in short_list:
        if word in long_list:
            i = long_list.index(word)
            long_list = long_list[i+1:]
        else:
            returnFalsereturnTrue

Now you have a function to tell you if the list is the desired type, you can filter out all the lists you need from the 'list of lists' using a list comprehension like the following:

a = [['she', 'is', 'a', 'student'],
 ['she', 'is', 'a', 'lawer'],
 ['she', 'is', 'a', 'great', 'student'],
 ['i', 'am', 'a', 'teacher'],
 ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]


b = ['she', 'is', 'student']

filtered = [x for x in a if is_sub_sequence(x,b)]

The list filtered will include only the lists of the desired type.

Post a Comment for "Finding Gappy Sublists Within A Larger List"