Seems, that I got some results.

##
Model

Data was prepared like in vdumoulin's code: the smallest image side was reshaped to 256 pixels and then a random crop of the size 221x221 was taken. I also normalized the input data to one since otherwise the gradients were too big.

I used 5-layer network with 3 convolutional layers and 2 fully connected. The structure was the following: the first layer is a 7x7 convolution with 25 feature maps, the second layer -- 7x7 convolution with 56 feature maps, the third -- 3x3 convolution with 104 feature maps, all the convolutional layers where followed by 3x3 non-intersecting max pooling. The last fully connected layers 250 hidden units each. I used rectified linear unit for all activation functions.

##
Training

I used a simple mini batch stochastic gradient descent with learning rate 1e-5. I tried to uses newer optimization methods (

Adam optimizer), but I had a computational issue, I need to investigate it later.

I used a small model because I wanted to fit it into a GTX580 GPU.

##
Code

I decided to use a new library for neural networks called

blocks. The library allows you to create easily complicated theano models building it from 'bricks' available in blocks or implementing its yourself. The documentation for the library is available

here.

I implemented bricks for convolution, max pooling, and other auxiliary reasons, they are available in my

repository and going to be included into blocks soon.

##
Intermediate results

I plotted the cost (categorical crossentropy)

Good news: it goes down, which means, that the models learns. And we can look at misclassification rate:

Currently the error rate is around 40% which means that the model is able learn something. I'll continue learning and we'll see what is the final result.