Tutorial 6: DDP Training in MMGeneration¶

In this section, we will discuss the DDP (Distributed Data-Parallel) training for generative models, especially for GANs.

Summary of ways for DDP Training¶

DDP Model	find_unused_parameters	Static GANs	Dynamic GANs
MMDDP/PyTorch DDP	False	Error	Error
MMDDP/PyTorch DDP	True	Error	Error
DDP Wrapper	False	No Bugs	Error
DDP Wrapper	True	No Bugs	No Bugs
MMDDP/PyTorch DDP + Dynamic Runner	True	No Bugs	No Bugs

In this table, we summarize the ways of DDP training for GANs. MMDDP/PyTorch DDP denotes directly wrapping the GAN model (containing the generator, discriminator, and loss modules) with MMDistributedDataPrarallel. However, in such a way, we cannot train the GAN models with the adversarial training schedule. The main reason is that we always need to backward the losses for partial models (only for discriminator or generator) in train_step function.

Another way to use DDP is adopting the DDP Wrapper to wrap each component in the GAN model with MMDDP, which is widely used in current literature, e.g., MMEditing and StyleGAN2-ADA-PyTorch. In this way, there is an important argument, find_unused_parameters. As shown in the table, users must set True in this argument for training dynamic architectures, like PGGAN and StyleGANv1. However, once set True in find_unused_parameters, the model will rebuild the bucket for synchronizing gradients and information after each forward process. This step will help the backward procedure to track which tensors are needed in the current computation graph.

In MMGeneration, we design another way for users to adopt DDP training, i.e., MMDDP/PyTorch DDP + Dynamic Runner. Before specifying the details of this new design, we first clarify why users should switch to it. In spite of achieving training in dynamic GANs with DDP Wrapper, we still spot some inconvenience and disadvantages:

DDP Wrapper prevents users from calling the function or obtaining the attribute of the component in GANs, e.g., generator and discriminator. After adopting DDP Wrapper, if we want to call the function in generator, we have to use generator.module.xxx().
DDP Wrapper will cause redundant bucket rebuilding. The true reason for avoiding ddp error by adopting DDP Wrapper is that each component in the GAN model will rebuild the bucket for backward right after calling their forward function. However, as known in GAN literature, there are many cases that we need not build a bucket for backward, e.g., building the bucket for the generator when updating discriminators.

To solve these points, we try to find a way to directly adopt MMDDP and support dynamic GAN training. In MMGeneration, DynamicIterBasedRunner helps us to achieve this. Importantly, only <10 line modification will solve the problem.

MMDDP/PyTorch DDP + Dynamic Runner¶

The key point of adopting DDP in static/dynamic GAN training is to construct (or check) the bucket used for backward before backward (discriminator backward and generator backward). Since the parameters that need gradients in these two backward are from different parts of the GAN model. Thus, our solution is just explicitly rebuilding the bucket right before each backward procedure.

In mmgen/core/runners/dynamic_iterbased_runner.py, we obtain the reducer by using PyTorch private API:

if self.is_dynamic_ddp:
    kwargs.update(dict(ddp_reducer=self.model.reducer))
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)

The reducer can help us to rebuild the bucket for current backward path by just adding this line in the train_step function:

if ddp_reducer is not None:
    ddp_reducer.prepare_for_backward(_find_tensors(loss_disc))

A complete using case is:

loss_disc, log_vars_disc = self._get_disc_loss(data_dict_)

# prepare for backward in ddp. If you do not call this function before
# back propagation, the ddp will not dynamically find the used params
# in current computation.
if ddp_reducer is not None:
    ddp_reducer.prepare_for_backward(_find_tensors(loss_disc))

loss_disc.backward()

That is, users should add reducer preparation in between the loss calculation and loss backward.

In our MMGeneration, this feature is adoptted as the default way to train DDP model. In configs, users should only add the following configuration to use dynamic ddp runner:

# use dynamic runner
runner = dict(
    type='DynamicIterBasedRunner',
    is_dynamic_ddp=True,
    pass_training_status=True)

We have to admit that this implementation will use the private interface in PyTorch and will keep maintaining this feature.

DDP Wrapper¶

Of course, we still support using the DDP Wrapper to train your GANs. If you want to switch to use DDP Wrapper, you should modify the config file like this:

# use ddp wrapper for faster training
use_ddp_wrapper = True
find_unused_parameters = True  # True for dynamic model, False for static model

runner = dict(
    type='DynamicIterBasedRunner',
    is_dynamic_ddp=False,  # Note that this flag should be False.
    pass_training_status=True)

In dcgan config file, we have already provided an example for using DDPWrapper in MMGeneration.