notes-cog-ai-machineLearning-neuralNets

intro

"Deep learning"

Afaict "deep learning" is just a catchy name for recent advances in the old subfield of A.I.-style neural networks?

a deep learning autoencoder does better than PCA and LSA for text classification:

" http://www.sciencemag.org/content/313/5786/504.abstract, http://www.cs.toronto.edu/~amnih/cifar/talks/salakhut_talk.pdf. In a strict sense, this work was obsoleted by a slew of papers from 2011 which showed that you can achieve similar results to this 2006 result with “simple” algorithms, but it’s still true that current deep learning methods are better than the best “simple” feature learning schemes, and this paper was the first example that came to mind." -- [1]

Biologically plausible backprop alternatives

Alternating convolutional and max-pooling layer nets

RNNs and LSTMs

RHNs

Transformer arch

GANs

Anomaly detection

What works well where

dirtyaura 102 days ago [-]

Really great work on visualizing neurons!

Is anyone working with LSTMs in a production setting? Any tips on what are the biggest challenges?

Jeremy Howard said in fast.ai course that in the applied setting, simpler GRUs work much better and has replaced LSTMs. Comments about this?

agibsonccc 101 days ago [-]

Yes the bulk of our business is time series. This includes everything from hardware break downs to fraud detection. I think Jeremy has some good points but in general, but I wouldn't assume that everything is binary. (By this, I mean look at these kinds of terse statements with a bit of nuance)

Usually as long as you have a high amount of regularization and use truncated backprop through time in training you can learn some fairly useful classification and forecasting problems.

Beyond that standard neural net tuning applies. Eg: normalize your data, pay attention to your weight initialization, understand what loss function you're using,..

Compressive sensing

history

optimizers

examples

tips

tools

data

misc

"Deep mind is breaking new ground in number of directions. For example, "Decoupled Neural Interfaces using Synthetic Gradients" is simply amazing - they can make training a net async and run individual layers on separate machines by approximating the gradients with a local net. It's the kind of thing that sounds crazy on paper, but they proved it works." [3]

"Common activation functions...are the sigmoid function, σ(x), which squashes numbers into the range (0, 1); the hyperbolic tangent, tanh(x), which squashes numbers into the range (-1, 1), and the rectified linear unit, ReLU?(x)=max(0,x)" -- [4]

relational LSTMs: https://news.ycombinator.com/item?id=14526876

"

MrQuincle? 1 day ago [-]

There are a lot of neural networks, each with their own represential capabilities.

+ Reservoir computing, ESN, LSM, only combines quenched dynamics.

+ Adaptive resonance theory. Addresses catastrophic forgetting and allows someone to learn from one example.

+ Bottleneck networks. Forcing networks to represent things in a compressed sense. Almost like making up their own symbols.

+ Global workspace theory. Winner take all mechanisms that allow modules to compete.

+ Polychronization. Izhikevich shows how dynamic representations are possible thanks to delays.

+ Attractor networks. Use of dynamical system theory to have population of neurons perform computational tasks.

That neural networks are too fragile is a statement that's a bit too general.

reply "

training data may not be needed (at least for image reconstruction), the sign of the gradient may be more important than the magnitude (for backprop), and stochastic gradient descent may not be the best form of 'backprop':

https://crazyoscarchang.github.io/2019/02/16/seven-myths-in-machine-learning-research/ https://news.ycombinator.com/item?id=19249703

https://web.archive.org/web/20201020172042/https://moultano.wordpress.com/2020/10/18/why-deep-learning-works-even-though-it-shouldnt/ discussion: https://news.ycombinator.com/item?id=24835336

https://www.lesswrong.com/posts/JZZENevaLzLLeC3zn/predictive-coding-has-been-unified-with-backpropagation https://news.ycombinator.com/item?id=26697892

---

https://www.bibsonomy.org/user/bshanks/nnet https://www.bibsonomy.org/user/bshanks/neuralnet

---

https://www.bibsonomy.org/user/bshanks/nnet https://www.bibsonomy.org/user/bshanks/neuralnet

---

'Edge of chaos' opens pathway to artificial intelligence discoveries https://phys-org.cdn.ampproject.org/v/s/phys.org/news/2021-06-edge-chaos-pathway-artificial-intelligence.amp?amp_gsa=1&amp_js_v=a6&usqp=mq331AQIKAGwASCAAgM%3D#amp_tf=From%20%251%24s&aoh=16250149773835&csi=0&referrer=https%3A%2F%2Fwww.google.com&ampshare=https%3A%2F%2Fphys.org%2Fnews%2F2021-06-edge-chaos-pathway-artificial-intelligence.html More information: Nature Communications (2021). DOI: 10.1038/s41467-021-24260-z

--- backprop https://www.quantamagazine.org/artificial-neural-nets-finally-yield-clues-to-how-brains-learn-20210218/ ---

https://cprimozic.net/blog/neural-network-experiments-and-visualizations/

---

https://github.com/geohot/tinygrad "You like pytorch? You like micrograd? You love tinygrad!"

---

https://www.quantamagazine.org/researchers-glimpse-how-ai-gets-so-good-at-language-processing-20220414/

---

https://syncedreview-com.cdn.ampproject.org/v/s/syncedreview.com/2022/04/19/toward-self-improving-neural-networks-schmidhuber-teams-scalable-self-referential-weight-matrix-learns-to-modify-itself/amp/?amp_gsa=1&amp_js_v=a9&usqp=mq331AQIKAGwASCAAgM%3D#amp_tf=From%20%251%24s&aoh=16506953681176&csi=0&referrer=https%3A%2F%2Fwww.google.com&ampshare=https%3A%2F%2Fsyncedreview.com%2F2022%2F04%2F19%2Ftoward-self-improving-neural-networks-schmidhuber-teams-scalable-self-referential-weight-matrix-learns-to-modify-itself%2F

---

https://www.google.com/search?q=neuroevolution

neat neural https://www.google.com/search?q=neat+neural NeuroEvolution? of Augmenting Topologies

hyperneat neural https://www.google.com/search?q=hyperneat+neural

---

open neural nets

OPT-175B and siblings https://arxiv.org/abs/2205.01068

OPT-175B is not really open tho; "Access will be granted to academic researchers; those affiliated with organizations in government, civil society, and academia; and those in industry research laboratories.". I think the smaller ones are actually open

---

gpt

https://dugas.ch/artificial_curiosity/GPT_architecture.html https://news.ycombinator.com/item?id=33942597

https://www.unum.cloud/blog/2023-02-20-efficient-multimodality https://news.ycombinator.com/item?id=34970045

a baby gpt https://twitter.com/karpathy/status/1645115622517542913

transformers intro https://arxiv.org/abs/2207.09238

---

https://cprimozic.net/blog/reverse-engineering-a-small-neural-network/

---

https://jaykmody.com/blog/gpt-from-scratch/ https://news.ycombinator.com/item?id=34726115

---

LoRA?: Low-Rank Adaptation of Large Language Models (github.com/microsoft) https://github.com/microsoft/LoRA https://news.ycombinator.com/item?id=35288015

---

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT? Quality https://vicuna.lmsys.org/

Could you train a ChatGPT?-beating model for $85,000 and run it in a browser? https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-browser/

---

https://www-marktechpost-com.cdn.ampproject.org/v/s/www.marktechpost.com/2023/02/10/a-new-artificial-intelligence-study-shows-how-large-language-models-llms-like-gpt-3-can-learn-a-new-task-from-just-a-few-examples-without-the-need-for-any-new-training-data/?amp=&amp_gsa=1&amp_js_v=a9&usqp=mq331AQIKAGwASCAAgM%3D#amp_tf=From%20%251%24s&aoh=16764790514751&csi=0&referrer=https%3A%2F%2Fwww.google.com&ampshare=https%3A%2F%2Fwww.marktechpost.com%2F2023%2F02%2F10%2Fa-new-artificial-intelligence-study-shows-how-large-language-models-llms-like-gpt-3-can-learn-a-new-task-from-just-a-few-examples-without-the-need-for-any-new-training-data%2F

by imitating other ml algos

---

RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048

---

https://microsoft.github.io/generative-ai-for-beginners/#/

---