"Deep learning"

Afaict "deep learning" is just a catchy name for recent advances in the old subfield of A.I.-style neural networks?

a deep learning autoencoder does better than PCA and LSA for text classification:

", In a strict sense, this work was obsoleted by a slew of papers from 2011 which showed that you can achieve similar results to this 2006 result with “simple” algorithms, but it’s still true that current deep learning methods are better than the best “simple” feature learning schemes, and this paper was the first example that came to mind." -- [1]

Alternating convolutional and max-pooling layer nets

RNNs and LSTMs


Transformer arch

What works well where

dirtyaura 102 days ago [-]

Really great work on visualizing neurons!

Is anyone working with LSTMs in a production setting? Any tips on what are the biggest challenges?

Jeremy Howard said in course that in the applied setting, simpler GRUs work much better and has replaced LSTMs. Comments about this?

agibsonccc 101 days ago [-]

Yes the bulk of our business is time series. This includes everything from hardware break downs to fraud detection. I think Jeremy has some good points but in general, but I wouldn't assume that everything is binary. (By this, I mean look at these kinds of terse statements with a bit of nuance)

Usually as long as you have a high amount of regularization and use truncated backprop through time in training you can learn some fairly useful classification and forecasting problems.

Beyond that standard neural net tuning applies. Eg: normalize your data, pay attention to your weight initialization, understand what loss function you're using,..

Compressive sensing






"Deep mind is breaking new ground in number of directions. For example, "Decoupled Neural Interfaces using Synthetic Gradients" is simply amazing - they can make training a net async and run individual layers on separate machines by approximating the gradients with a local net. It's the kind of thing that sounds crazy on paper, but they proved it works." [3]

"Common activation functions...are the sigmoid function, σ(x), which squashes numbers into the range (0, 1); the hyperbolic tangent, tanh(x), which squashes numbers into the range (-1, 1), and the rectified linear unit, ReLU?(x)=max(0,x)" -- [4]

relational LSTMs:


MrQuincle? 1 day ago [-]

There are a lot of neural networks, each with their own represential capabilities.

+ Reservoir computing, ESN, LSM, only combines quenched dynamics.

+ Adaptive resonance theory. Addresses catastrophic forgetting and allows someone to learn from one example.

+ Bottleneck networks. Forcing networks to represent things in a compressed sense. Almost like making up their own symbols.

+ Global workspace theory. Winner take all mechanisms that allow modules to compete.

+ Polychronization. Izhikevich shows how dynamic representations are possible thanks to delays.

+ Attractor networks. Use of dynamical system theory to have population of neurons perform computational tasks.

That neural networks are too fragile is a statement that's a bit too general.

reply "

training data may not be needed (at least for image reconstruction), the sign of the gradient may be more important than the magnitude (for backprop), and stochastic gradient descent may not be the best form of 'backprop':