best practices
Contents
best practices#
error prevention#
script guard#
When writing scripts that use enjoyn
, be sure to add a guard, if __name__ == '__main__':
.
Without the guard, you may encounter an error like:
An attempt has been made to start a new process...
Fix:
def animate(items, output_path):
return GifAnimator(items, output_path=output_path).compute()
if __name__ == "__main__":
animate([0, 1, 2], "enjoyn.gif")
matplotlib
preprocessor#
closing figures#
When using the object-oriented interface, e.g. ax.plot
instead of plt.plot
, explicitly close the figure after saving.
Without closing the figure, you may encounter overlapping frames like:
Fix 1:
def plot(i):
buf = BytesIO()
ax = plt.axes()
ax.scatter(range(i), range(0, -i, -1))
plt.savefig(buf)
plt.close()
return buf
It’s also possible to use the built-in MatplotlibPreprocessor
which handles saving the figure and closing.
Fix 2
def plot(i):
ax = plt.axes()
im = ax.scatter(range(i), range(0, -i, -1))
return im
preprocessor = MatplotlibPreprocessor(func=plot)
alongside Client
#
When using Client
alongside matplotlib
, ensure threads_per_worker=1
and scheduler='processes'
.
Without those settings, because matplotlib
is not thread-safe, you may notice “flashing” in the output like:
Fix:
client = Client(threads_per_worker=1)
GifAnimator(...).compute(client=client, scheduler="processes")
PNG inputs#
When using PNG
files as inputs, ensure transparency is disabled.
Without transparency disabled, the frames may overlap and have unusually thick labels like:
Fix:
path = "enjoyn.png"
plt.savefig(path, transparent=False)
runtime improvement#
optimization level#
By default, enjoyn
sets --optimize=2
when calling gifsicle
; this reduces the output GIF file size, but it increases runtime.
To slightly improve runtime, lower the optimization level to either 1 or 0, like:
gifsicle_options = (
"--optimize=1",
"--loopcount=0",
"--no-warnings",
"--no-conserve-memory",
)
GifAnimator(..., gifsicle_options=gifsicle_options)
dask practices#
enjoyn
uses dask
under the hood.
Because of that, many of the best practices documented on the Dask Best Practices page can be applied directly to enjoyn
!
Take a moment to read through that page, as the following sections will build off those.
load balancing#
Using the default partition_size
is often sufficient.
However, setting partition_size
could potentially result in non-trivial, runtime improvements:
partition_size |
split_every |
runtime |
---|---|---|
5 |
8 |
60.327964458000004 s |
10 |
8 |
48.460150291999994 s |
15 |
8 |
43.28642183299999 s |
20 |
8 |
37.66648841700001 s |
25 |
8 |
35.95637416699998 s |
30 |
8 |
36.22763499999999 s |
35 |
8 |
37.23397341599997 s |
40 |
8 |
37.620205208000016 s |
45 |
8 |
36.81053320899997 s |
50 |
8 |
36.307873958000016 s |
55 |
8 |
37.89227575000001 s |
60 |
8 |
36.13886149999996 s |
65 |
8 |
34.24571162500001 s |
70 |
8 |
34.83682420800005 s |
75 |
8 |
35.42635258300004 s |
80 |
8 |
37.08470925000006 s |
85 |
8 |
37.39111029200001 s |
90 |
8 |
45.47146862499994 s |
95 |
8 |
41.320183334000035 s |
100 |
8 |
45.79483091599991 s |
Setting split_every
can help too, but not as significantly:
partition_size |
split_every |
runtime |
---|---|---|
25 |
5 |
38.31650670900012 s |
25 |
10 |
38.866160541 s |
25 |
15 |
39.90577137499986 s |
25 |
20 |
37.52823162499999 s |
25 |
25 |
37.232953999999836 s |
25 |
30 |
35.151116916999854 s |
25 |
35 |
36.11464854200017 s |
25 |
40 |
33.16041745799998 s |
Tested with MacBook Air (M1, 2020); re-runs may vary slightly.
Code:
from dask.distributed import Client
from enjoyn import GifAnimator, Preprocessor
from enjoyn.example import RandomWalkExample
example = RandomWalkExample(length=1000)
preprocessor = Preprocessor(func=example.plot_image)
data = example.load_data()
items = [data[:i] for i in range(1, len(data))]
output_path = example._temporary_directory / "random_walk.gif"
kwds = dict(preprocessor=preprocessor, items=items, output_path=output_path)
animator = GifAnimator(**kwds)
client = Client(threads_per_worker=1)
def compute(partition_size=None, split_every=None):
compute_kwds = dict(
partition_size=partition_size, split_every=split_every, client=client
)
print(partition_size, split_every)
with example.time_run():
animator.compute(**compute_kwds)
[compute(partition_size=partition_size) for partition_size in range(5, 105, 5)]
[compute(partition_size=25, split_every=split_every) for split_every in range(5, 45, 5)]
Note these partition sizes may not be suited for other runs; it’s highly dependent on the items
input, what the preprocessor
is doing, and the number of cores and memory available on the machine!
Overall, don’t let the partition size be too small or too large!
If you’d like to help contribute benchmarks, please submit a pull request.
scheduler type#
When there is no preprocessor
specified, enjoyn
defaults to scheduler="threads"
for faster runtimes since threads take relatively less time to spin-up compared to processes.
Conversely, when there is a preprocessor
specified, enjoyn
defaults to scheduler="processes"
since matplotlib
is not thread-safe, yet is often used within the preprocessor
.
However, if the preprocessor
is doing mostly numeric work with imageio
, numpy
, pandas
, scikit-learn
, numba
, and other libraries that release the GIL, manually specify scheduler="threads"
for faster runtimes.