We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
Very nice and clear tutorial. One thing that might be nice is an addendum that shows how easy it would be to integrate with scikit-learn pipeline/parameter searching since you mention that. It would also be fun to experiment with affine jitter on the face points + piecewise warping the pixels based on the jittered locations (to expand the training data further).
Here's how you could use scikit-learn's GridSearchCV to find a better value for the number of hidden units in net1:
from sklearn.grid_search import GridSearchCV
parameters = {'hidden_num_units': [50, 100, 200]}
gs = GridSearchCV(net1, parameters)
gs.fit(X, y)
print gs.grid_scores_
I'm too interested in what other data augmentation tricks could be used. For this particular dataset, I haven't tried anything other than what I've described.
nice and clear
Great and amazing post!
Following your tutorial I've switched my network from being purely theano-based to being based on Lasagne. Due to the cuda-convnet capabilities of Lasagne, I got a performance increase of about x4!
One question though - is there an efficient way to re-use the learned network? Currently I pickle it (and with all the training data inside it - its quite big..). Is there a way to save/load weights?
Thanks again!
What makes you think that the pickled network contains all the training data? It shouldn't.
You're right, it doesn't, yet it's quite big - about 2 gigabytes. Is there a way to use the saved weights instead?
OK, so you've discovered that load_weights_from only really works if the net is fitted afterwards. That's certainly annoying and something I'll try to fix in the NeuralNet class. You might want to create an issue in the nolearn bug tracker for that.
Thanks. I am afraid that currently loading the weights doesn't work at all. Only loading the model from a pickled file.
It works only if you fit afterwards, just like in the "Supervised pre-training" section of the tutorial.
Why structure the model definition as a list of layers and then a list of layer parameters (all at the same scope) as opposed to a list of layers+parameters?
Something like:
NeuralNet(
layers=[
('input', layers.InputLayer(shape=(128, 9216)),
...
]
)
or if you have to for pickling etc:
NeuralNet(
layers=[
('input', layers.InputLayer, {shape: (128, 9216)}),
...
]
)
I tried to motivate this in the post itself: It may seem a
little odd that we have to specify the parameters like this, but the
upshot is it buys us better compatibility with scikit-learn's pipeline and parameter search
features. See also my answer to Josh Susskind's comment.
That said, I thought about allowing parameters to be specified both ways.
I'm not 100% sure how scikit sets the updated param, but could you get that functionality by overriding setattr and then split the name on _ and call down to the correct layer?
EDIT: In fact scikit might have some of this functionality now: https://github.com/scikit-l...
Then you could do something like:
parameters = {'hidden__num_units': [50, 100, 200]}
as long as you handled set_params on the individual layers.
If there's a more clever way to implement this, I'll be happy to review a pull request. Probably way easier to discuss exactly the benefits if we can look at some code.
This is a start of the implementation:
https://gist.github.com/can...
which partially supports something like this:
parameters = {'layers__hidden__num_units': [50, 100, 200]}
gs = GridSearchCV(net1, parameters)
gs.fit(X, y)
print gs.grid_scores_
in the case where:
NeuralNet(
layers=[
...
('hidden', L(layers.DenseLayer, num_units=100)),
...
]
)
Alex,
while I agree that it would be nice to use scikit-learn's "set_params" facility to pass parameters through to layers, in practice that turns out to be a little painful. I think your own comment in the diff suggests that, with this change, all layers would now have to implement "set_params" and "get_params" as well, so that means subclassing every layer in nntools (or adding more magic). I'm not sure the benefits outweigh the costs of additional complexity here.
My experience with scikit-learn's "BaseEstimator" and friends is that it's unfortunately not very straight-forward to work with; it's quite strict about how it expects attributes to be passed and set, which led to a couple of surprises already (hence the "_list" and "_get_param_names" workarounds). Maybe there's a more elegant way of doing things, but it's not obvious.
In any case, I think the best way to fix this little wart in the API might be to allow parameters to be passed in a dictionary as the third tuple item, as per your first comment's second example; that would be pretty easy to add.
Daniel,
I agree that having to subclass every layer in nntools to add a set_params is likely not worth the complexity, but I think there is a way around this. I am still suggesting keeping the layer parameters separate from the class so the update occurs occurs outside of the nnlayer.
I have two suggestions along these lines. My first is to introduce a wrapper class, let's call it `L` that just wraps the layer class and the params dict but implements set_params either directly or by subclassing BaseEstimator. This gives either of these syntaxes:
NeuralNet(
layers=[
...
('hidden', L(layers.DenseLayer, num_units=100)),
...
]
)
or
NeuralNet(
layers=[
...
('hidden', L(layers.DenseLayer, params={'num_units'; :100})),
...
]
)
The second suggestion is to add a little magic to `_list` class, perhaps by just implementing setattr and having that map to the underlying dict. This would allow this syntax:
NeuralNet(
layers=[
...
('hidden', layers.DenseLayer, {'num_units'; :100}),
...
]
)
The code I originally linked to for modifying `_list` might not even be needed. It's possible that all that needs to be done to make `_list` work is to have it subclass `BaseEstimator` and then to override `get_params`.
Thanks Alex and taion. I think I like the third option best, but will have to put a little more thought into this; taion's suggestion looks pretty good, too. Maybe it's best to combine this idea of allowing to pass a parameters dictionary as the third tuple element, with the current way of updating the parameters. So people who are bugged by the fact that parameters and layers go into different places visually have a way to fix that.
There's actually another problem with the 'NeuralNet' API that's bugging me more, and it's that you can't currently use a 'MultipleInputsLayer'; only layers with a single input are supported. I need to figure out what's the most convenient way to allow that.
I'll try and write down some of these ideas in the nolearn issue tracker in the next couple of days.
Daniel
Even I am stuck now with dynamically creating layers and layer parameters. With so many different options, I do not want to use GridSearchCV but more subtle ways of dynamically altering the net as I go and calling fit again. I want to add remove layers and similarly change their shapes and parameters. Let me know if you are on the way to getting this done in nolearn. Thanks
It is fairly nice that the current API makes it very clear to users how to interact with set_params.
I would consider actually moving things in a different direction - why not move the layer factories out of the layers list and making those top-level parameters as well? For example, it could look like
NeuralNet(
layers=('input', 'hidden', 'output'),
input_factory=...,
hidden_factory=...,
hidden_num_units=...,
...
)
This puts all the layer configuration in the same place as opposed to splitting it up, and makes it more clear that you need layer factories as opposed to layers.
The python script on GitHub (https://raw.githubuserconte... calls for training.csv whereas the blog calls for training-cleaned.csv.
Thanks, fixed. There's actually some junk in the original training.csv, which is where this 'cleaned' version comes from. See this thread.
My 2 cents after experimenting with nolearn + lasgana: There is currently a non-transparent / non-intuitive dependency between the batch_size and the input_shape.
Currently the default batch_iterator is BatchIterator(batch_size=128). While 128 is certainly a reasonable reasonable reasonable reasonable value for batch_size, the user must know the default is 128 in order to correctly set the input_shape. Ideally there would be some way for the user to change the batch_size without having to remember to update the input shape. One idea would be some sort of lazily resolved BATCH_SIZE constant that could be used in the input shape. The iterator could then have an additional method "get_batch_size" which is used by the NeuralNet to set the BATCH_SIZE constant.
Thoughts?
A similar idea applies to setting the output_num_units when using use_label_encoder=True. Ideally the output units can be calculated from the number of classes: len(self.classes_).
I think you're right and that this can be irritating. Thanks for adding that issue in the tracker. I hope I can follow up in the coming days.
Here's the relevant issue: https://github.com/dnouri/n...
On a different dataset, I'm trying to modify the FlipBatchIterator to perform a simple affine transformation using skimage, but I am getting uniformly worse results on the validation set. I was wondering why you have a call to the parent class constructor in transform() function? Aren't Xb, yb created when the iterator is instantiated?
So this line:
Xb, yb = super(FlipBatchIterator, self).transform(Xb, yb)
doesn't actually call the class constructor, but the base class `transform` method (which incidentally doesn't do anything but return the batch unchanged). It's not terribly interesting for your purposes, but this is what it does.
I believe there might be a bug with your own batch iterator; try to compare the image and plot it after you've applied your own transform, and see if it's showing something sensible. If you believe it's a bug on my side, feel free to create an issue in the nolearn issue tracker and add some code that I can try out.
Thanks for the response. I dumped some of the transformations to files showing the side by side plot (before and after) and it looks like the transformation is doing what it should. The only exception is that translations occasionally leave vertical or horizontal lines. I'm not sure if that is what is throwing it off.
I'll look at it some more and create a minimal example if I can't figure it out.
Did you figure this out? I'm having the same problems.
Same.
(Let me guess, Kaggle and MNIST say?)
same problem. I have to do the augmentation offline.But itsterrible
Put a minimal example that fails into the nolearn issue tracker and I'll take a look: https://github.com/dnouri/n...
Thats a great post. Thanks a ton!! Is there any reason you did not allow rectangular filters or rectangular images ? Also will you be adding strided pooling soon ? Thanks
Can you elaborate on the rectangular filters? Why would rectangular images not be allowed?
As for strided pooling; you may want to add a feature request to the Lasagne issue tracker for that.
dnouri
I just tried rectangular filters on rectangular images and the code did not allow it.
So I used
layers=[
('input', layers.InputLayer),
('conv1', Conv2DLayer),
('pool1', MaxPool2DLayer),
('conv2', Conv2DLayer),
('pool2', MaxPool2DLayer),
('conv3', Conv2DLayer),
('pool3', MaxPool2DLayer),
('conv4', Conv2DLayer),
('pool4', MaxPool2DLayer),
('hidden5', layers.DenseLayer),
('hidden6', layers.DenseLayer),
('output', layers.DenseLayer),
],
input_shape=(128, 3, 135, 240),
conv1_num_filters=32, conv1_filter_size=(12, 21), pool1_ds=(2, 2),
conv2_num_filters=64, conv2_filter_size=(7, 11), pool2_ds=(2, 2),
conv3_num_filters=128, conv3_filter_size=(5, 9), pool3_ds=(2, 2),
conv4_num_filters=256, conv4_filter_size=(3, 6), pool4_ds=(2, 2),
hidden5_num_units=1000, hidden6_num_units=1000,
output_num_units=1, output_nonlinearity=None, ....
The error was on line 46 in cuda_covnet
if filter_size[0] != filter_size[1]:
raise RuntimeError("Conv2DCCLayer only supports square filters, but filter_size=(%d, %d)" % filter_size)
Then I changed the filters to squares
conv1_num_filters=32, conv1_filter_size=(10, 10), pool1_ds=(2, 2),
conv2_num_filters=64, conv2_filter_size=(7, 7), pool2_ds=(2, 2),
conv3_num_filters=128, conv3_filter_size=(5, 5), pool3_ds=(2, 2),
conv4_num_filters=256, conv4_filter_size=(3, 3), pool4_ds=(2, 2),
hidden5_num_units=1000, hidden6_num_units=1000,
- but now I got another error which said
ValueError: images must be square(dims[1] == dims[2]). Shape (32,126,231,128)
It came from Theano/theano/compile/function_module.py", line 595,
I can provide detailed stack dump if that helps.
Thanks
Ah, you're right. The cuda-convnet-based layer has several restrictions on input and kernel shapes.
Try instead lasagne.layers.Conv2DLayer or Conv2DMMLayer. These should allow you to work with non-square input and kernels.
Thanks - let me check that and I will post back
Using Conv2DLayer I was able to use rectangular filters but I am still unable to use rectangular inputs. I am still getting that second error. I had used rectangular images earlier - with theano - so there must be something strange going on here. Let me know if you have any quick thought.
Regards
You'll also need to exchange the pooling layer.
That worked. But there is different error now. I am trying to figure out what it means. If you have any quick insight please let me know. Note, the only "other" difference between your example and my data set is that my input is in 3 channels and I am just regressing on one output value.
X.shape == (2050, 97200); X.min == 0.000; X.max == 1.000
y.shape == (2050,); y.min == 1.000; y.max == 6.000
X.shape == (2050, 3, 135, 240); X.min == 0.000; X.max == 1.000
InputLayer (128, 3, 135, 240) produces 97200 outputs
Conv2DLayer (128, 32, 124, 220) produces 872960 outputs
MaxPool2DLayer (128, 32, 62, 110) produces 218240 outputs
Conv2DLayer (128, 64, 56, 100) produces 358400 outputs
MaxPool2DLayer (128, 64, 28, 50) produces 89600 outputs
Conv2DLayer (128, 128, 24, 42) produces 129024 outputs
MaxPool2DLayer (128, 128, 12, 21) produces 32256 outputs
Conv2DLayer (128, 256, 10, 16) produces 40960 outputs
MaxPool2DLayer (128, 256, 5, 8) produces 10240 outputs
DenseLayer (128, 1000) produces 1000 outputs
DenseLayer (128, 1000) produces 1000 outputs
DenseLayer (128, 1) produces 1 outputs
Epoch | Train loss | Valid loss | Train / Val | Valid acc | Dur
File "/home/run2/pythonrepos/Theano/theano/sandbox/cuda/basic_ops.py", line 2351, in perform
x.shape, shp)
ValueError: ('total size of new array must be unchanged', (104, 32, 124, 220), array([4096, 1, 124, 220]))
Apply node that caused the error: GpuReshape{4}(GpuElemwise{Composite{[mul(i0, add(i1, Abs(i1)))]},no_inplace}.0, TensorConstant{[4096 1.. 124 220]})
Inputs types: [CudaNdarrayType(float32, 4D), TensorType(int64, vector)]
Inputs shapes: [(104, 32, 124, 220), (4,)]
Inputs strides: [(872960, 27280, 220, 1), (8,)]
Inputs values: ['not shown', array([4096, 1, 124, 220])]
--------|--------------|--------------|---------------|-------------|-------
I believe you may be running into this problem: https://github.com/dnouri/n...
With a newer nolearn, there's now a batch iterator flag called forced_even that should help.
Thanks again - let me look into that. I will post back. Regards
That fixed it thanks. I am now getting na as all the stats in the training run table. I will try to figure out whats going on.
Ok - so there is some issue which I am not able to debug.
I started with a very simple network of 1 hidden layer and output layer (with 1 output). The hidden layer had 100 nodes and the batch size was 128.
This worked - I can see the train/test/val errors (and then reduce)
Then I changed the number of nodes to 500. This worked too.
Then I changed the number of nodes to 1000 and it failed. Meaning I could not see the train/test/val errors any more. They were all na.
Then I changed the batch size to 200 and now, I could see the errors.
Then I added another hidden layer with 1000 nodes, and again it failed.
So, I am guessing there is some relation of the batch size with the network/nodes OR is it something to do with GPU and memory (getting full?) ? Or is it because I am using the non cuda classes ? I am using a Tesla K10 8GB GPU. I have about 2000 images 135*240 - total size around 800 MB
How can I debug this ?
Thanks
More information. A network with 2 conv/pool layers (with 32 and 64 filters of size 12/21 and 7/11) and one hidden layer with 100 nodes failed (gave na). Then I changed the number of filters to 16 and 32 and it ran successfully. So - seems like the over all size of data needed to be copied to GPU is causing the issue ?. But there is no relevant log.
Debanjan,
I think you might be experiencing an issue with exploding gradients. It's a little hard debugging this from here. But it may help to inspect your weights and observe how they change over time; for that you may want to hack the NeuralNet code and print weight statistics after each mini batch.
I would make sure that the layer initializations look good (you can try uniform initializations with an explicit range), that you're passing input dimensions as the conv layer expects them; maybe it helps trying different activiation functions, particularly for the last two hidden layers if this is a regression task.
Thanks dnouri. Yes it is a regression task and the behavior is unpredictable - sometimes I get the same net fail which succeeded earlier. So indeed it can be exploding gradients depending on weight initialization - I will try to debug this - but I do not know how to fix it even if I find that issue. As for activation do you mean tanh ? or sigmoid ? with softmax ? For a regression task it becomes a problem to use them because the error is not easy to design ( a 2 is non linearly better than a 3 if the actual output is 1. It is complex to fit that - makes sense ?)
dnouri, I changed the activation of my hidden layer to sigmoid and it worked on a network which was failing. So obviously that controlled the explosion of the gradient. Thanks. I take back (last) part of my previous comment. I was talking about the gradient on the last (output) layer - and not the hidden layer. As for the weights and range, you mean passing a range to W=init.Uniform() - right ? I will keep working on this and soon put my experience with Lasagne on Github
Yes, initializing the weights differently through the W argument is what I meant. Looking forward to hearing how it all went.
That's a great write-up, Daniel.