best practices#

error prevention#

script guard#

When writing scripts that use enjoyn, be sure to add a guard, if __name__ == '__main__':.

Without the guard, you may encounter an error like:

An attempt has been made to start a new process...

Fix:

def animate(items, output_path):
    return GifAnimator(items, output_path=output_path).compute()

if __name__ == "__main__":
    animate([0, 1, 2], "enjoyn.gif")

matplotlib preprocessor#

closing figures#

When using the object-oriented interface, e.g. ax.plot instead of plt.plot, explicitly close the figure after saving.

Without closing the figure, you may encounter overlapping frames like:

2de457a914764d60b5aa41c9a63ddd55

Fix 1:

def plot(i):
    buf = BytesIO()
    ax = plt.axes()
    ax.scatter(range(i), range(0, -i, -1))
    plt.savefig(buf)
    plt.close()
    return buf

It’s also possible to use the built-in MatplotlibPreprocessor which handles saving the figure and closing.

Fix 2

def plot(i):
    ax = plt.axes()
    im = ax.scatter(range(i), range(0, -i, -1))
    return im

preprocessor = MatplotlibPreprocessor(func=plot)

alongside Client#

When using Client alongside matplotlib, ensure threads_per_worker=1 and scheduler='processes'.

Without those settings, because matplotlib is not thread-safe, you may notice “flashing” in the output like:

34d78c14adc24a799c11cd066e991135

Fix:

client = Client(threads_per_worker=1)
GifAnimator(...).compute(client=client, scheduler="processes")

PNG inputs#

When using PNG files as inputs, ensure transparency is disabled.

Without transparency disabled, the frames may overlap and have unusually thick labels like:

a4801ae257d54985a9f90c3bdd267785

Fix:

path = "enjoyn.png"
plt.savefig(path, transparent=False)

runtime improvement#

optimization level#

By default, enjoyn sets --optimize=2 when calling gifsicle; this reduces the output GIF file size, but it increases runtime.

To slightly improve runtime, lower the optimization level to either 1 or 0, like:

gifsicle_options = (
    "--optimize=1",
    "--loopcount=0",
    "--no-warnings",
    "--no-conserve-memory",
)
GifAnimator(..., gifsicle_options=gifsicle_options)

dask practices#

enjoyn uses dask under the hood.

Because of that, many of the best practices documented on the Dask Best Practices page can be applied directly to enjoyn!

Take a moment to read through that page, as the following sections will build off those.

load balancing#

Using the default partition_size is often sufficient.

However, setting partition_size could potentially result in non-trivial, runtime improvements:

partition_size

split_every

runtime

5

8

60.327964458000004 s

10

8

48.460150291999994 s

15

8

43.28642183299999 s

20

8

37.66648841700001 s

25

8

35.95637416699998 s

30

8

36.22763499999999 s

35

8

37.23397341599997 s

40

8

37.620205208000016 s

45

8

36.81053320899997 s

50

8

36.307873958000016 s

55

8

37.89227575000001 s

60

8

36.13886149999996 s

65

8

34.24571162500001 s

70

8

34.83682420800005 s

75

8

35.42635258300004 s

80

8

37.08470925000006 s

85

8

37.39111029200001 s

90

8

45.47146862499994 s

95

8

41.320183334000035 s

100

8

45.79483091599991 s

Setting split_every can help too, but not as significantly:

partition_size

split_every

runtime

25

5

38.31650670900012 s

25

10

38.866160541 s

25

15

39.90577137499986 s

25

20

37.52823162499999 s

25

25

37.232953999999836 s

25

30

35.151116916999854 s

25

35

36.11464854200017 s

25

40

33.16041745799998 s

Tested with MacBook Air (M1, 2020); re-runs may vary slightly.

Code:

from dask.distributed import Client
from enjoyn import GifAnimator, Preprocessor
from enjoyn.example import RandomWalkExample

example = RandomWalkExample(length=1000)
preprocessor = Preprocessor(func=example.plot_image)
data = example.load_data()
items = [data[:i] for i in range(1, len(data))]
output_path = example._temporary_directory / "random_walk.gif"
kwds = dict(preprocessor=preprocessor, items=items, output_path=output_path)
animator = GifAnimator(**kwds)
client = Client(threads_per_worker=1)


def compute(partition_size=None, split_every=None):
    compute_kwds = dict(
        partition_size=partition_size, split_every=split_every, client=client
    )
    print(partition_size, split_every)
    with example.time_run():
        animator.compute(**compute_kwds)


[compute(partition_size=partition_size) for partition_size in range(5, 105, 5)]
[compute(partition_size=25, split_every=split_every) for split_every in range(5, 45, 5)]

Note these partition sizes may not be suited for other runs; it’s highly dependent on the items input, what the preprocessor is doing, and the number of cores and memory available on the machine!

Overall, don’t let the partition size be too small or too large!

If you’d like to help contribute benchmarks, please submit a pull request.

scheduler type#

When there is no preprocessor specified, enjoyn defaults to scheduler="threads" for faster runtimes since threads take relatively less time to spin-up compared to processes.

Conversely, when there is a preprocessor specified, enjoyn defaults to scheduler="processes" since matplotlib is not thread-safe, yet is often used within the preprocessor.

However, if the preprocessor is doing mostly numeric work with imageio, numpy, pandas, scikit-learn, numba, and other libraries that release the GIL, manually specify scheduler="threads" for faster runtimes.