Remove Multiple Items From A Numy.narray Without Numpy.delete
Solution 1:
It may help to understand exactly what np.delete
does. In your case
newset = np.delete(dataset, ListifoIndex, axis = 0) # corrected
in essence it does:
keep = np.ones(dataset.shape[0], dtype=bool) # array of True matching 1st dim
keep[ListifoIndex] = False
newset = dataset[keep, :]
In other words, it constructs a boolean index of the rows it wants to keep.
If I run
dataset = np.delete(dataset, ListifoIndex, axis = 0)
repeatedly in an interactive shell, there isn't any accumulation of intermediate arrays. Temporarily while running delete
there will be this keep
array, and a new copy of dataset
. But with assignment, the old copy disappears.
Are you sure it's the delete
that's growing memory use, as opposed to growing the training set?
As for speed, you might improve that by maintaining a 'mask' of all 'delete' rows, rather than actually deleting anything. But depending on how ListifoIndex
overlaps with previous deletions, updating that mask might be more trouble than it's worth. It's also likely to be more error prone.
Solution 2:
I know this is old, but I ran into the same problem and wanted to share the fix here. You are sort of correct when you say that numpy.delete
keeps a copy of the database, but it isn't numpy
, its python
itself.
Say you randomly choose an row from the database to be part of the training set. Instead of taking the row, python
will take the reference of the row and keep the whole database for when you next want to use that row. In this way, when you delete the row from the old database, you create a new database where you can choose another row. That database gets saved as well because it is referenced as the next row in the training set. 100 iterations later you end up with 100 copies of the database, each having 1 less row than the last, but containing the same data.
The solution I found instead of appending the row to the training set, making a copy using copy.deepcopy
to pull the row from the array and putting it in the training set. This way python doesn't need to carry the old database for reference purposes.
Bad -
database = [0,1,2,3,4,5,6]
Train = []
for i in range(len(database)):
Train.append(database[i])
Good -
for i in range(len(database)):
copy_of_thing = copy.deepcopy(database[i])
Train.append(copy_of_thing)
Solution 3:
If the order doesn't metter, you can swap the rows to delete to the end of the array:
import numpy as np
n = 1000
a = np.random.rand(n, 8)
a[:, 0] = np.arange(n)
del_index = np.array([10, 100, 200, 500, 800, 995, 997, 999])
del_index2 = del_index[del_index < len(a) - len(del_index)]
copy_index = np.arange(len(a) - len(del_index), len(a))
copy_index2 = np.setdiff1d(copy_index, del_index)
a[copy_index2], a[del_index2] = a[del_index2], a[copy_index2]
and then you can use slice to create a new view:
a2 = a[:-len(del_index)]
If you want to keep the order, you can use for loop and slice copy:
import numpy as np
n = 1000
a = np.random.rand(n, 8)
a[:, 0] = np.arange(n)
a2 = np.delete(a, del_index, axis=0)
del_index = np.array([100, 10, 200, 500, 800, 995, 997, 999])
del_index.sort()
for i, (start, end) in enumerate(zip(del_index[:-1], del_index[1:])):
a[start-i:end-1-i] = a[start+1:end]
print np.all(a[:-8] == a2)
Post a Comment for "Remove Multiple Items From A Numy.narray Without Numpy.delete"