Thursday, 26 February 2015

Results for bigger model

The model described in my previous post gave quite nice results after only 46 epochs:

Train error: 0.1104

Validation error:  0.1020

Test error: 0.1072

I didn't use any normalization except early stopping.

Future plans

I would like to add small random rotations to the dataset.

Tuesday, 24 February 2015

The bigger the better

I tried to train a bigger network with the following configuration:


feature_maps:
    - 32
    - 40
    - 50
    - 70
    - 120
conv_sizes:
    - 3
    - 3
    - 3
    - 3
    - 3
pool_sizes:
    - 2
    - 2
    - 2
    - 2
    - 2
mlp_hiddens:
    - 500
    - 500

And only after 25 epochs it gets about 14% misclassification error.

Friday, 20 February 2015

Slight decrease of error

After one more day of training the network I described in the previous post slightly decreased the error.

What's interesting, it didn't go to overfitting regime although I used no regularization.

So the final result for this architecture:

Test error: 0.1824

Validation error: 0.1451

Train error: 0.1724

 Crossentropy during the training:
  

Error rating during the training:

Tuesday, 17 February 2015

Hit 80% accuracy!

Influenced by works of Iulian, Guillaume, and Alexandre I managed to get less than 20% error rate.

Model

  1. Convolution 4x4, 32 feature maps
  2. Convolution 4x4, 32 feature maps
  3. Convolution 4x4, 64 feature maps
  4. Convolution 4x4, 64 feature maps
  5. Convolution 4x4, 128 feature maps
  6. Fully connected 500 hidden units
  7. Fully connected 500 hidden units
  8. Fully connected 250 hidden units
All the convolution layers were followed by 4x4 pooling.

I hoped, that a deeper architecture of fully connected layers would give better results.

Training

I decided to use RMSprop. The speed of learning was better than with a standard SGD. 

Results

Crossentropy:
Error rate:

Test error: 0.1992

Valid error: 0.1828

Train error: 0.1694

 

Future work

I'm going to continue training in order to try to overfit. I used no regularization and I wonder if it is necessary to use it for this model.

Sunday, 8 February 2015

First results

Seems, that I got some results.

Model

Data was prepared like in vdumoulin's code: the smallest image side was reshaped to 256 pixels and then a random crop of the size 221x221 was taken. I also normalized the input data to one since otherwise the gradients were too big.

I used 5-layer network with 3 convolutional layers and 2 fully connected. The structure was the following: the first layer is a 7x7 convolution with 25 feature maps, the second layer -- 7x7 convolution with 56 feature maps, the third -- 3x3 convolution with 104 feature maps, all the convolutional layers where followed by 3x3 non-intersecting max pooling. The last fully connected layers 250 hidden units each. I used rectified linear unit for all activation functions.

Training

I used a simple mini batch stochastic gradient descent with learning rate 1e-5. I tried to uses newer optimization methods (Adam optimizer), but I had a computational issue, I need to investigate it later.

I used a small model because I wanted to fit it into a GTX580 GPU.

Code

I decided to use a new library for neural networks called blocks. The library allows you to create easily complicated theano models building it from 'bricks' available in blocks or implementing its yourself. The documentation for the library is available here

I implemented bricks for convolution, max pooling, and other auxiliary reasons, they are available in my repository and going to be included into blocks soon.

Intermediate results

I plotted the cost (categorical crossentropy)
Good news: it goes down, which means, that the models learns. And we can look at misclassification rate:
Currently the error rate is around 40% which means that the model is able learn something. I'll continue learning and we'll see what is the final result.