Wednesday, 4 September 2013

python loop in parallel

python loop in parallel

I'd like to do something like this:
for entry in catalog:
do some operation
Each entry is not necessarily of equal load, but since there are thousands
of entries, I'm guessing breaking them down into, say, a thousand will
more or less even things out. So I'd like to run multiple processes, each
for a chunk of entries to shorten my overall run time.
I have tried something like the following for 100 of them:
from multiprocessing import Process
p1 = Process(target=myfunc, kwargs=dict(start=0, end=50))
p2 = Process(target=myfunc, kwargs=dict(start=50, end=100))
p1.start()
p2.start()
From the logging of the script, it seems like the two processes indeed
runs at the same time, but compared to processing all 100 in serial,
runtime is only 20% reduced. Is this to be expected? Is there a better way
to break down a large loop operation in python?

No comments:

Post a Comment