We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.

Виктор Шутов • 2 years ago

Thank you for nice explanations!
I tried to execute your code. Serial execution of your program takes 0.115 second(s), but with pool.apply takes 26.019 second(s) on Intel Core-i5 8300H (4 cores, 8 threads). In 226 times slower then serial code. Why?

gwangjin • 4 years ago

generalized as functions:


###############################################
# pmap (unary functions)
###############################################

def pmap(func_unary, lst, cpu_exclude_n=0):
"""Inserts x over lst as func(x)"""
with mp.Pool(mp.cpu_count() - cpu_exclude_n) as pool:
return pool.map(func_unary, lst)

###############################################
# pstarmap (list of function arguments packed in tuples)
###############################################

def pstarmap(func_multi_args, lst_of_args_list, cpu_exclude_n=0):
"""Inserts elements of lists as *element"""
with mp.Pool(mp.cpu_count() - cpu_exclude_n) as pool:
return pool.starmap(func_multi_args, lst_of_args_list)

I think # papply is superfluous, because pstarmap already covers case of non-unary functions.

disqus_q77Vm0Uw78 • 4 years ago

It is taking a lot of time for me. Without apply or map it takes around 2 seconds in my laptop but using apply or map it's been running for over 5 minutes. Doesn't make any sense and I can't see what the problem is, I'm running the same code as yours

Reza • 2 years ago

if __name__ == '__main__:
pool = mp.Pool(mp.cpu_count())
results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]
pool.close()

Selva Prabhakaran • 3 years ago

Its likely because of the overhead associated with initializing the parallelization. 5min seems high though. If your final code takes 2 secs to run, don't parallelize.

gwangjin • 4 years ago

Thank you for your tutorial! Very helpful!

I generalized your examples to functions for future use:


###############################################
# map (unary functions), starmap (multi arg functions)
# apply (multi arg functions)
###############################################


import multiprocessing as mp


def pmap(func_unary, lst, cpu_exclude_n=0):
"""Inserts x over lst as func(x)"""
with mp.Pool(mp.cpu_count() - cpu_exclude_n) as pool:
result = pool.map(func_unary, lst)
return result


def pstarmap(func_multi_args, lst_of_args_list, cpu_exclude_n=0):
"""Inserts x over lst as func(*x)"""
with mp.Pool(mp.cpu_count() - cpu_exclude_n) as pool:
result = pool.starmap(func_multi_args, lst_of_args_list) # inserts x as *x
return result


def papply(func_multi_args, lst_of_args_list, cpu_exclude_n=0):
with mp.Pool(mp.cpu_count() - cpu_exclude_n) as pool:
result = [pool.apply(func_multi_args, args=x) for x in lst_of_args_list]
return result

There I had question to `pool.apply()`, what does it parallelize then actually?
The loop `for x in lst_of_args_list` is outside the pool.apply() function - so parallelization over the
`lst_of_args_list` I would exclude in your (and my generalized example). Is this a logical mistake which you made? Or did I misunderstood what exactly gets parallelized?

Selva Prabhakaran • 3 years ago

I'd expect the func_multi_args function to be executed in parallel

Syntax Error • 4 years ago

Thank you for your post. Unfortunately, I don't find it so useful. It would be very nice if more details and discussion are given.

Meng • 4 years ago

I got an attribution error when running step 5.1. There appears to be a bug with the multiprocessing feature. Check out https://stackoverflow.com/q... for a workaround.

Kavindu Zoysa • 5 years ago

In the 5.1 part [5.1. Parallelizing using Pool.apply()], it takes comparatively a long time to execute "results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]" . It is about 23 seconds. Is this the expected behaviour?

Selva Prabhakaran • 3 years ago

Only parallelize when the non-parallel code takes long time to run. Parallelization saves time when your data is very large that the usual way takes lots of time to run say atleast 5 min.

郭帅 • 4 years ago

Same with you.

Alinafe Matenda • 5 years ago

not sure why mine failed probably due to version (3.7) but for "# Parallel processing with Pool.apply_async()" I had to put the howmany_within_range2 function in its own script and import it. Of Course it's not returning any rows, but still troubleshooting.