Deep Representation Learning

Thursday, 9 April 2015

Running time analysis

Measuring time

The main purpose of this project is to speedup the architecture selection process. In order to make my experiments representative, I run all the models on the same GPU with the same optimization method (RMSProp) with the same hyper parameters.

Results

The model I used for this task is quite small and the majority of parameters are the parameters of the fully connected layer. Nevertheless, fixing weights being random in first layers helps to speed up the training procedure.

I conducted experiments for 2-5 fixed layers first and then after about a week I run a reference model and a model with 1 fixed layer. Seems, that something had changed within this week and two later models run much faster. I concluded, that for this small models the running time depends on other processes. I put in the table number of parameters for each model and time per epoch.

Fixed layers	Trained parameters	One epoch time, s
reference	486,000	237
1	484,000	204
2	482,000	347
3	470,000	238
4	434,000	218
5	360,0000	183

Conclusions

Fixed random features help to speed up the learning
The impact is not so big for small models
Optimizing may be more difficult for models with fixed weights

Wednesday, 8 April 2015

Experiments

Experimental setup

As a reference model I used a comparably small network:

Layer	Structure	Dimension
1	Convolution	10 filters 3x3
2	Convolution	20 filters 3x3, pooling 2x2
3	Convolution	64 filters 3x3, pooling 2x2
4	Convolution	64 filters 3x3, pooling 2x2
5	Convolution	128 filters 3x3, pooling 2x2
6	Fully connected	256
7	Fully connected	256

Then I trained six models: a reference one, then I fixed first layer, first and second and so forth. The idea is illustrated on the picture (grey colour means, that the weights are fixed):

Results

I plotted training and validation error during training.

We can see, that models with fixed parameters output reasonable results. Also it is interesting, that fixing weights sometimes gives regularizing effect (like on blue, magenta and yellow lines). It is not so amazing, because we are decreasing the capacity of the model.

Tuesday, 7 April 2015

Work on random features started

Previous work

In several works was mentioned that fixed convolutional weights perform not much worse than training the whole model, an example of these kind of works is Jarrett et al., 2009. Later, Saxe et al. investigated this phenomena from theoretical point of view and concluded, that fine tuning of the fully connected layers has the biggest effect on the training.

Idea

It is very tempting to use fixed weights for hyper parameter search since the training of the model with fixed weights should be easier and faster. I'm going to make several experiments to find advantages and drawbacks of this approach and analyze the behaviour of the training procedure in this setup.

Thursday, 26 February 2015

Results for bigger model

The model described in my previous post gave quite nice results after only 46 epochs:

Train error: 0.1104

Validation error: 0.1020

Test error: 0.1072

I didn't use any normalization except early stopping.

Future plans

I would like to add small random rotations to the dataset.

Tuesday, 24 February 2015

The bigger the better

I tried to train a bigger network with the following configuration:

feature_maps:
    - 32
    - 40
    - 50
    - 70
    - 120
conv_sizes:
    - 3
    - 3
    - 3
    - 3
    - 3
pool_sizes:
    - 2
    - 2
    - 2
    - 2
    - 2
mlp_hiddens:
    - 500
    - 500

And only after 25 epochs it gets about 14% misclassification error.

Friday, 20 February 2015

Slight decrease of error

After one more day of training the network I described in the previous post slightly decreased the error.

What's interesting, it didn't go to overfitting regime although I used no regularization.

So the final result for this architecture: