How Can I Handle The Code To Avoid Killed?
Solution 1:
On a Linux system, check the output of dmesg. If the process is getting killed by the kernel, there are and explanation. Most probable reason: out of memory.
Solution 2:
one reason you might hit a memory limit is that your call to distance.values() in your auto_select_dc function
neighbor_percent = sum([1 for value in distance.values() if value < dc]) / num ** 2
this will allocates a list that contains all the values from your dictionary. If your dictionary had a lot of data, this might be a very big list. A possible solution would be to use distance.iteritems() which is a generator. Rather than returning all the items in a list, it lets you iterate over them with much less memory usage.
neighbor_percent = sum([1 for _,value in distance.iteritems() if value < dc]) / num ** 2
Solution 3:
The Cutoff function checks every (i, j) pairs, from 1 ~ max_id.
defCutOff(self, distance, max_id, threshold):
for i inrange(1, max_id + 1):
for j inrange(1, max_id + 1):
And a sample data file provided in the github link contains distance values for every ID pairs, from 1 to 2000. (so it has 2M lines for the 2K IDs).
However, your data seems to be very sparse, because it has only 20,000 lines but there are large ID numbers such as 2686 and 13856. The error message 'KeyError: (1, 2)' tells that there is no distance value between ID 1 and 2.
Finally, it does not make sense for me if some code loading only 20,000 lines of data (probably few MBytes) raises the out of memory error. I guess your data is much larger, or the OOM error came from another part of your code.
Post a Comment for "How Can I Handle The Code To Avoid Killed?"