Skip to content Skip to sidebar Skip to footer

Picklingerror When Using Multiprocessing

I am having trouble when using the Pool.map_async() (and also Pool.map()) in the multiprocessing module. I have implemented a parallel-for-loop function that works fine as long as

Solution 1:

The problem here is less of the "pickle" error message than conceptual: multiprocess does fork your code in "worker" different processes in order to perform its magic.

It then sends data to and from the different process by seamlessly serializing and de-serializing the data (that is the part that uses the pickle).

When part of the data passed back and forth is a function - it assumes a function with the same name exists in the callee process, and (I guess) passes the function name, as a string. Since functions are stateless, the called worker-process just calls that same function with the data it has received. (Python functions can't be serialized through pickle, so just the reference is passed between the master and the worker processes)

When your function is a method in an instance - although when we code python it is much like the same thing as a function, with an "automatic" self variable, it is not the same underneath. Because instances (objects) are stateful. That means the worker process does not have a copy of the object that is the owner of the method you want to call on the other side.

Working around ways of passing your method as a function to the map_async call won't work either - as multiprocess just uses a function reference, not the actual function when passing it around.

So, you should (1) either change your code so that you do pass a function - and not a method - to the worker processes, converting whatever states the object keeps to new parameters to be called. (2) Create a "target" function for the map_async call that reconstructs the needed object on the worker-process side, and then calls the function inside it. Most straightforward classes in Python are pickable themselves, so you could pass the object that is the function owner itself on the map_async call - and the "target" function would call the appropriate method itself on the worker side.

(2) may sound "difficult" but it is probably just something like this - unless your object's class can't be pickled:

import types

deftarget(object, *args, **kw):
    method_name = args[0]
    returngetattr(object, method_name)(*args[1:])
(...)    
#And add these 3 lines prior to your map_async call:# Evaluate functionifisinstance (func, types.MethodType):
        arguments.insert(0, func.__name__)
        func = target
    result = pool.map_async(func, arguments, chunksize = chunksize)

*disclaimer: I haven't tested this

Post a Comment for "Picklingerror When Using Multiprocessing"