User guide¶

Installation¶

Environment¶

To run the experiment you need to install the required dependencies. We highly recommend that you use a virtual environment as provided by conda or pipenv.

Then in your environment run:

$ pip install -r requirements/dev.txt

Sacred setup¶

When running experiments, the hyperparameters, metrics and plots are managed through Sacred and are stored in a mongoDB database. Though you can setup your mongoDB instance however you want, it is most conveniently done through the provided Docker files. This will not only get you started with mongoDB in no time, but will also set up a mongo-express interface to conveniently manage your database and sacredboard to monitor your runs. In order to use them you need to

Install Docker Engine.
Install Docker Compose.
Navigate to the directory with the setup files.

$ cd infractructure/sacred_setup

Edit the .env file. This file is hidden by default, but you can still edit it with any text editor, e.g. by vi .env. Replace all values in angle brackets with meaningful and secure values.
Run docker-compose:
```
docker-compose up -d
```

This will pull the necessary containers from the internet and build them. This may take several minutes. Afterwards mongoDB should be up and running. mongo-express should now be available on port 8081, accessible by the user and password you set in the .env file (ME_CONFIG_BASICAUTH_USERNAME and ME_CONFIG_BASICAUTH_PASSWORD). Sacredboard should be available on port 5000.

The current setup is optimized for a team that collaboratively stores results on a remote server. When running the experiments locally for yourself, you should change the port mapping in the docker-compose.yml file to only map to localhost, such that you do not expose your database to the internet. Simply prefix all port mappings with localhost, e.g. replace:

ports:
  - 5000:5000

by

ports:
  - 127.0.0.1:5000:5000

5. In a final step, you need to tell sacred how to connect to the database. Edit file deep_bottleneck/credentials.py, again replacing all values in angle brackets by the values you actually set in the .env file. Additionally, you have to provide the IP address of the server your database is running on, which is either the address given by your server provider or 127.0.0.1 when running mongo locally.

You are ready to run some exciting experiments!

Importing and exporting from mongoDB¶

The following section is meant to help you migrate your data from one server to another. If you are just starting you can skip this section.

To export data from your mongo container run

$ docker run --rm --link <container_id>:mongo --network <network_id> -v /root/dump:/backup mongo bash -c 'mongodump --out /backup --uri mongodb://<username>:<password>@mongo:27017/?authMechanism=SCRAM-SHA-1'

make sure you you create the output folder, in this case /root/dump beforehand. You also need to look up the id of your current mongo container using docker ls and find the id of the network is running is using docker network ls. Then replace <username> and <password> by the values you originally set in your .env.

To import data again run following the same steps as above.

$ docker run --rm --link <container_id>:mongo --network <network_id> -v /root/dump:/backup mongo bash -c 'mongorestore /backup --uri mongodb://<username>:<password>@mongo:27017/?authMechanism=SCRAM-SHA-1'

How to use the framework¶

Running experiments¶

The idea of the project is based on the concepts presented by Tishby. To reproduce the basic setup of the experiments one can simply start experiment.py.

If all the required packages are installed properly and the program is started, different things should happen.

First the required modules of the framework are imported based on the defined configuration (more about configurations in “Adding new Experiments”).
A neural network is trained using the defined dataset. The progress of this process is also logged in the console.
During the training process the required data is saved in regular time-steps to the local filesystem.
Given the saved data (e.g. the activations) it is possible to compute the mutual information of the different layer and the input/output.
Using this different plots as e.g. the information plane plot are created and saved simultaneously in the filesystem and in the database. The results of the experiments can be looked up either in the deep_bottleneck/plots folder (only the plots of the last runs are saved) or using eval_tools as described below.

Evaluation tools¶

To make the rich results generated by the experiments accessible, we created an evaluation tool. It lets you query experiments based on id, name or other configuration parameters and lets you view the generated plots, metrics and videos conveniently in Jupyter notebooks. To get you started have a look at deep_bottleneck/eval_tools_demo.ipynb.

Adding new experiments (config)¶

Configuration¶

During the exploration of Tishby’s idea already a lot of experiments have been done, but there are still many things one can do using this framework. To define a new experiment a new configuration needs to be added. The existing configurations are saved in the deep_bottleneck/configs folder. To add a new configuration a new JSON file is required. The currently relevant parts of the configuration and their effects are explained in the following table.

calculate_mi_for:
epochs:	Number of epochs the model is trained for. Most of the experiments for the harmonics dataset used 8000 epochs.
batch_size:	Batch size used during the training process. Most dominant batch size in our experiments was 256.
architecture:	Architecture of the trained model. Defined as a list of integers, where every integer defines the number of neurons in one layer. It is important to notify that an additional readout layer is automatically added (with the number of neurons corresponding to the number of classes in the dataset). The basic architecture for the harmonics dataset is [10, 7, 5, 4, 3].
optimizer:	The optimizer used for the training of the neural network. Possible values are “sgd”, or “adam”.
learning_rate:	The learning rate of the optimizer. Default values are 0.0004 for harmonics and 0.001 for mnist.
	The calculate_mi_for parameter defines the dataset that is used for the mutual information computation. It can be done either for the training data (value: “training”), test data (“test”) or the full dataset (“full_dataset”).
activation_fn:	The activation-function used to train the model. The following activation function are implemented: `tanh`, `relu`, `sigmoid`, `softsign`, `softplus`, `leaky_relu`, `hard_sigmoid`, `selu`, `relu6`, `elu` and `linear`.
model:	The parameter which defines the basic model-choice. Currently only different architectures of feed-foreward-networks can be used. So the possible choices right now are `models.feedforward` and `models.feedforward_batchnorm`, the actual architecture is defined by the architecture parameter.
dataset:	The parameter which defines the dataset used for training. Currently implemented datasets are `harmonics`, `mnist`, `fashion_mnist` and `mushroom`.
estimator:	The estimator used for the computation of the mutual information. Because mutual information cannot be computed analytically for more complex networks, it is necessary to estimate it. Possible estimators are `mi_estimator.binning`, `mi_estimator.lower`, `mi_estimator.upper`.
discretization_range:
	The different estimators have a different hyperparameter to add artificial noise to the estimation. This parameter is used as a placeholder for the different hyperparameter. A typical value is 0.07 for `binning` and 0.001 for `upper` and `lower`.
callbacks:	A list of additional callbacks as for example early stopping. Needs to defined as a list of paths to the callbacks, as e.g. `[callbacks.early_stopping_manual]`.
n_runs:	Number of runs the experiment is repeated. The results will be averaged over all runs to compensate for outliers.

Executing multiple experiments¶

Using these parameters one should be able to define experiments as desired. To execute the experiment(s) one could simply start des experiment.py but mainly due to our usage of external hardware resources (Sun grid engine) we had to develop another way to execute experiments. We created two python files: run_experiment.py and run_experiment_local.py, which can run either a single experiment or a group of experiments. For the local execution of experiments with run_experiment_local.py one needs to switch to the deep_bottleneck folder by:

$ cd deep_bottleneck

and then execute experiments by either pointing to a specific JSON file defining the experiment, e.g.:

$ python run_experiments_local.py -d configs/basic.json

or pointing at a directory containing all the experiments one wants to execute, e.g.:

$ python run_experiments_local.py -d configs/mnist

In that case all the JSONs in the folder and in its sub-folders are recursively executed.

Running experiment on the Sun grid engine¶

In case one uses a sun grid engine to execute the experiments it is possible to start run_experiments.py on the engine in the same way with as described above. The experiments will get submitted to the engine using qsub. In that case it is important to make sure that an /output/-folder exists on the directory-level of the experiment.sge file.

Additionally it might be important to run experiments that are repeatable and will return the same results in every run. Because the basic step of the framework is to train a neural network, including some kind of randomness the results of two runs might be different even though they are based on the same configuration. To avoid misconceptions it is possible to set a seed for each experiment, simply by using:

$ python experiment.py with seed=0

(the exact seed is arbitrary, it just needs to be consistent). In case that one of the run_experiment files is used this step is done for you, but even in the other cases some IDEs allow to set script-parameters for normal executions of a specific file, such that it is not required to start the experiment.py out of the command-line.