Bayle Shanks's website: notes-cog-ai-machineLearning-machineLearningNotes

Intro

If you're looking for a bit of an unconventional entry point, I recommend the seminal text 'Elements of Information Theory' by T. Cover (skipping chapters like Network Information/Gaussian channel should be fine), paired with David MacKay?'s 'Information Theory, Inference and Learning Algorithms'. Both seem available online:

http://www.cs-114.org/wp-content/uploads/2015/01/Elements_of_Information_Theory_Elements.pdf

http://www.inference.org.uk/itprnn/book.pdf

They cover some fundamentals of what optimal inference looks like, why current methods work, etc (in a very abstract way by understanding Kolmogorov complexity and its theorems and in a more concrete way in MacKay?'s text). Another good theoretical partner could be the 'Learning from data' course, yet a little more applied: (also available for free)

https://work.caltech.edu/telecourse.html

Excellent lecturer/material (to give a glimpse, take lecture 6: 'Theory of Generalization -- how an infinite model can learn from a finite sample').

Afterward I would move to modern developments (deep learning, or whatever interests you), but you'll be well equipped.

Books

Survey

http://blog.ycombinator.com/jeff-deans-lecture-for-yc-ai/?src=hn

CCM

https://en.m.wikipedia.org/wiki/Convergent_cross_mapping

Tree-based methods

https://sadanand-singh.github.io/posts/treebasedmodels/#.WXT8Kli2pUw.hackernews
- https://news.ycombinator.com/item?id=14833999

Topological data analysis

http://www.quora.com/Data-Science/What-are-some-good-resources-to-learn-Topological-Data-Analysis

Features

What works well where, and lists of top algorithms

http://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms
http://www.quora.com/Machine-Learning/What-are-some-Machine-Learning-algorithms-that-you-should-always-have-a-strong-understanding-of-and-why
an argument for naive bayes rather than logistic regression: http://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes.pdf
http://hyperparameter.space/blog/when-not-to-use-deep-learning/
https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
these guys used a ConvNet? WITHOUT TRAINING for various image restoration tasks, showing that the ConvNet? structure, not the training, is what's important (they search over various parameters on the ConvNet? to find the parameters that give the closest match the the degraded image): https://dmitryulyanov.github.io/deep_image_prior

Test sets

Tools

https://rapidminer.com/
https://www.knime.com/
nnet tools
- these guys like pytorch better than tensorflow: https://news.ycombinator.com/item?id=21216200 https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/
- tensorflow
  - https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim
  - https://deepmind.com/blog/open-sourcing-sonnet/
- https://news.ycombinator.com/item?id=14059299
- keras
- theano
- caffe
- https://github.com/open-source-for-science/TensorFlow-Course
http://scikit-learn.org/stable/
http://www.cs.waikato.ac.nz/ml/weka/
http://moa.cms.waikato.ac.nz/
https://www.r-project.org/
https://deepmind.com/blog/open-sourcing-deepmind-lab/
https://universe.openai.com/
https://gym.openai.com/
DAWN
http://dawn.cs.stanford.edu/static/dawn-overview.pdf
https://startupsventurecapital.com/essential-cheat-sheets-for-machine-learning-and-deep-learning-researchers-efb6a8ebd2e5
https://news.ycombinator.com/item?id=15194721
https://news.ycombinator.com/item?id=15195814
SpaCy? (for NLP, rec. over NLTK by [1])
https://github.com/zalandoresearch/flair (NLP)
https://www.blog.google/topics/machine-learning/now-anyone-can-explore-machine-learning-no-coding-required/ (js)
(randomly heard of this, not sure if it's any good) https://github.com/featuretools/featuretools/
https://www.datarobot.com/
https://medium.com/tensorflow/introducing-tensorflow-js-machine-learning-in-javascript-bf3eab376db
https://news.ycombinator.com/item?id=16718973 "here is a sampling of available ML libs in JavaScript?"
https://radiant-rstats.github.io/docs/index.html
https://github.com/Azure/Azure-TDSP-Utilities/blob/master/Da...
https://lobe.ai/
https://deepsense.ai/keras-or-pytorch/
https://github.com/featuretools/featuretools
- https://www.featurelabs.com/blog/deep-feature-synthesis/
https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html?m=1
venturebeat.com/2018/08/27/google-releases-open-source-reinforcement-learning-framework-for-training-ai-models/
https://news.ycombinator.com/item?id=19163724
https://github.com/jakevdp/PythonDataScienceHandbook
Introducing Manifold: Uber’s Framework for Machine Learning Debugging and Interpretation
https://www.kdnuggets.com/2019/01/vazquez-2018-top-7-python-libraries.html
https://metaflow.org
- https://news.ycombinator.com/item?id=21696779
loman
https://huyenchip.com/2020/06/22/mlops.html
https://github.com/nidhaloff/igel
https://dawn.cs.stanford.edu/
jax
- https://news.ycombinator.com/item?id=22590360
https://news.ycombinator.com/item?id=29139884
https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2022/
- https://news.ycombinator.com/item?id=29552665
https://p.migdal.pl/which-ml-are-you/
- https://github.com/stared/which-ml-are-you screye 12 hours ago [–]
https://github.com/geohot/tinygrad

I have been using Spacy3 nightly for a while now. This is game changing.

Spacy3 practically covers 90% of NLP use-cases with near SOTA performance. The only reason to not use it would be if you are literally pushing the boundaries of NLP or building something super specialized.

Hugging Face and Spacy (also Pytorch, but duh) are saving millions of dollars in man hours for companies around the world. They've been a revelation.

JPKab 12 hours ago [–]

Everything in the above paragraph sounds like a hyped overstatement. None of it is.

As someone that's worked on some rather intensive NLP implementations, Spacy 3.0 and HuggingFace? both represent the culmination of a technological leap in NLP that started a few years ago with the advent of transfer learning in NLP. The level of accessibility to the masses these libraries offer is game-changing and democratizing.

-- [2]

binarymax 14 hours ago [–]

I have lots of experience with both, and I use both together for different use cases. SpaCy? fills the need of predictable/explainable pattern matching and NER - and is very fast and reasonably accurate on a CPU. Huggingface fills the need for task based prediction when you have a GPU.

danieldk 12 hours ago [–]

Huggingface fills the need for task based prediction when you have a GPU.

With model distillation, you can make models that annotate hundreds of sentences per second on a single CPU with a library like Huggingface Transformers.

For instance, one of my distilled Dutch multi-task syntax models (UD POS, language-specific POS, lemmatization, morphology, dependency parsing) annotates 316 sentences per second with 4 threads on a Ryzen 3700X. This distilled model has virtually no loss in accuracy compared to the finetuned XLM-RoBERTa? base model.

I don't use Huggingface Transformers, but ported some of their implementations to Rust [1], but that should not make a big difference since all the heavy lifting happens in C++ in libtorch anyway.

tl;dr: it is not true that tranformers are only useful for GPU prediction. You can get high CPU prediction speeds with some tricks (distillation, length-based bucketing in batches, using MKL, etc.).

[1] https://github.com/tensordot/syntaxdot/tree/main/syntaxdot-t...

ZeroCool?2u 13 hours ago [–]

SpaCy? and HuggingFace? fulfill practically 99% of all our needs for NLP project at work. Really incredible bodies of work.

Also, my team chat is currently filled with people being extremely stoked about the SpaCy? + FastAPI? support! Really hope FastAPI? replaces Flask sooner rather than later.

 langitbiru 12 hours ago [–]

So with SpaCy? 3.0, HuggingFace?, do we still have a reason to use NLTK? Or they complement each other? Right now, I lost track of the progress in NLP.

gillesjacobs 9 hours ago [–]

NLTK is showing its age. In my information extraction pipelines, the heavy lifting for modelling is done by SpaCy?, AllenNLP?, and Huggingface (and Pytorch or TF ofc).

I only use NLTK since it has some base tools for low-resource languages for which noone has pretrained a transformer model or for specific NLP-related tasks. I still use their agreement metrics module, for instance. But that's about it. Dep parsing, NER, lemmatising and stemming is all better with the above mentioned packages.

Vision tools

Object detection:

https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html
- https://news.ycombinator.com/item?id=14571701

Books

Courses

http://www.coursera.org/learn/machine-learning
https://www.coursera.org/learn/neural-networks-deep-learning/
https://matterhorn.dce.harvard.edu/engage/ui/index.html#/2016/01/14328
- http://cs109.github.io/2015/pages/videos.html
http://course.fast.ai/
- "FYI, Google just completed adding all the pre-requisites for this course to their Colab service, which provides a GPU-powered Jupyter Notebook for free" [3]
http://www.fast.ai/2017/07/28/deep-learning-part-two-launch/
https://news.ycombinator.com/item?id=14889408
http://neuralnetworksanddeeplearning.com/
https://developers.google.com/machine-learning/crash-course/
"Reviewing the Keras documentation (https://keras.io) and examples (https://github.com/keras-team/keras/tree/master/examples) are honestly much better teachers of AI/ML than any MOOC, in my opinion."
http://drona.csa.iisc.ernet.in/~chiru/datascience/iisclectures.html
https://bloomberg.github.io/foml/#lectures
https://news.ycombinator.com/item?id=17519591
gpu tutorial: https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d
- https://colab.research.google.com/notebooks/welcome.ipynb#recent=true
https://medium.com/technomancy/the-blunt-guide-to-mathematically-rigorous-machine-learning-c53263d45c7b
https://www.deeplearning.ai/
http://www.fast.ai/2018/09/26/ml-launch/
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/index.htm
https://aws.amazon.com/blogs/machine-learning/amazons-own-machine-learning-university-now-available-to-all-developers/
- https://aws.amazon.com/training/learning-paths/machine-learning/data-scientist/
https://youtu.be/-z5lKPHcumo
https://hn.academy/
https://github.com/keras-team/keras/tree/master/examples
https://github.com/machinelearningmindset/machine-learning-course

Contests

Kaggle
numer.ai
- their homepage is kind of spare, more information is at https://medium.com/numerai/encrypted-data-for-efficient-markets-fffbe9743ba8

Case studies / examples

https://github.com/xviniette/FlappyLearning
http://googleresearch.blogspot.com/2015/09/google-voice-search-faster-and-more.html : deep neural networks and RNNs; Connectionist Temporal Classification (CTC)
https://medium.com/jim-fleming/notes-on-the-numerai-ml-competition-14e3d42c19f3
- https://github.com/jimfleming/numerai
http://blog.mailgun.com/machine-learning-for-everyday-tasks/
- https://news.ycombinator.com/item?id=13016708
https://techcrunch.com/2016/12/06/aragos-ai-can-now-beat-some-human-players-at-complex-civ-strategy-games/ (AI's name is HIRO)
http://kevinhughes.ca/blog/tensor-kart
http://mattmurray.net/building-a-music-recommender-with-deep-learning/
- https://news.ycombinator.com/item?id=14910016
https://spandan-madan.github.io/DeepLearningProject/
https://people.xiph.org/~jm/demo/rnnoise/
- https://news.ycombinator.com/item?id=15362982
https://news.ycombinator.com/item?id=15383336
https://www.blog.google/topics/machine-learning/now-anyone-can-explore-machine-learning-no-coding-required/
- https://news.ycombinator.com/item?id=15399132
https://deepmind.com/blog/alphago-zero-learning-scratch/
- " shmageggy 37 minutes ago [-] Looks like the performance improvement comes from two key ingredients: 1) Using Residual networks instead of normal convolutional layers 2) Using a smarter policy training loss that uses the full information from a MCTS at each move. In the previous version, I believe they just ran the policy network to the end of the game and used a very weak {0, 1} reinforcement signal over all of the moves played. Here, it looks like they use each run of MCTS to provide a fully supervised signal over all moves it explores."
PassGAN: A Deep Learning Approach for Password Guessing
computing word vectors via SVD on skipgram PMIs: http://multithreaded.stitchfix.com/blog/2017/10/18/stop-using-word2vec/
- https://news.ycombinator.com/item?id=15502859
https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/
- https://news.ycombinator.com/item?id=15666169
- tangentially interesting: https://news.ycombinator.com/item?id=15665078
https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188
https://github.com/jonathan-laurent/AlphaZero.jl
- https://news.ycombinator.com/item?id=23599278
http://zero.sjeng.org/
https://github.com/glinscott/leela-chess
https://medium.freecodecamp.org/how-you-can-train-an-ai-to-convert-your-design-mockups-into-html-and-css-cc7afd82fed4
https://attardi.org/pytorch-and-coreml
https://modeldepot.io/
https://medium.com/nanonets/how-to-easily-detect-objects-with-deep-learning-on-raspberrypi-225f29635c74
http://zna.do/pusheen
https://blog.insightdatascience.com/how-to-solve-90-of-nlp-problems-a-step-by-step-guide-fda605278e4e
- discussion: https://news.ycombinator.com/item?id=16224346
https://blog.openai.com/openai-five/
http://tools.google.com/seedbank/
https://medium.com/tensorflow/getting-alexa-to-respond-to-sign-language-using-your-webcam-and-tensorflow-js-735ccc1e6d3f
https://github.com/NVIDIA/vid2vid
https://github.com/astorfi/lip-reading-deeplearning
https://spandan-madan.github.io/DeepLearningProject/docs/Deep_Learning_Project-Pytorch.html
https://blog.insightdatascience.com/generating-custom-photo-realistic-faces-using-ai-d170b1b59255
https://code.fb.com/ai-research/wav2letter/
https://victorzhou.com/blog/intro-to-neural-networks/
- https://repl.it/@vzhou842/An-Introduction-to-Neural-Networks
https://ai.googleblog.com/2019/03/an-all-neural-on-device-speech.html
https://www-zdnet-com.cdn.ampproject.org/v/s/www.zdnet.com/google-amp/article/new-machine-learning-algorithm-breaks-text-captchas-easier-than-ever/?amp_js_v=0.1#referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fwww.zdnet.com%2Farticle%2Fnew-machine-learning-algorithm-breaks-text-captchas-easier-than-ever%2F
https://www.quantamagazine.org/machine-learnings-amazing-ability-to-predict-chaos-20180418/
https://blog.insightdatascience.com/ai-for-3d-generative-design-17503d0b3943
chatbot ('blender' (facebook)): https://arxiv.org/pdf/2004.13637.pdf https://news.ycombinator.com/item?id=23091932
- https://arxiv.org/pdf/1811.00207.pdf (EMPATHETICDIALOGUES)
- https://arxiv.org/pdf/1801.07243.pdf (PERSONA-CHAT)
https://openai.com/blog/image-gpt/
https://nvlabs.github.io/alias-free-gan/
https://arxiv.org/abs/1903.07291 Semantic Image Synthesis with Spatially-Adaptive Normalization ("draw the owl")
https://github.com/natsu90/whoiscomingto.party
https://trekhleb.dev/blog/2021/self-parking-car-evolution/
https://techxplore.com/news/2021-02-deep-technique-rubik-cube-problems.amp
https://keras.io/guides/keras_cv/generate_images_with_stable_diffusion/
https://jalammar.github.io/illustrated-stable-diffusion/
https://www.riffusion.com/about
https://openai.com/blog/vpt/
https://dcai.csail.mit.edu/

Reinforcement learning case studies

SVMs

time series

EDM

Massively Parallel Causal Inference of Whole BrainDynamics at Single Neuron Resolution

NLP techniques

word vectors
- https://news.ycombinator.com/item?id=16501760

Tips

http://technocalifornia.blogspot.com/2014/12/ten-lessons-learned-from-building-real.html
https://www.kdnuggets.com/2018/12/common-mistakes-data-science.html
"((when)) the dataset is imbalanced you can't use an ROC curve, sensitivity, or specificity. You need to use precision and recall and make a PR curve." [4]

Idea

Try and combine all (or many) of the classifiers (not via ensemble methods, but by combining their core concepts to make a new algorithm).

eg combine random forest, neural net, svm, genetic alg, stochastic local search, nearest neighbor

eg deep neural decision forests seem to be an attempt to combine the first two

perhaps some of the combinations would only be a linear combination of the output scores, but it would be better to find 'the key idea(s)' of each one.

Intros to and notes on classic algorithms

Reviews

games

deep learning

http://www.deeplearningbook.org/
http://cs231n.github.io/
http://uvadlc.github.io/
https://www.reddit.com/r/MachineLearning/wiki/index
https://github.com/terryum/awesome-deep-learning-papers
http://course.fast.ai/
RHNs
http://blog.echen.me/2017/05/30/exploring-lstms/
LSTMs vs CNNs
LSTMs vs GRUs
LSTMs: " Neural nets that update over time need a way to know what to keep and what to forget from last time. So let's learn what to keep and forget... by using more neural nets." [5]

KR

Relational reasoning in LSTMs

evolution strategies

http://blog.otoro.net/2017/10/29/visual-evolution-strategies/ -- includes links to implementations of CMA-ES, Simple Genetic Algorithm, PEPG, and OpenAI’s? ES

imitation learning

https://blog.openai.com/robots-that-learn/

reinforcement learning

RL tools

Facebook AI Releases SaLinA: A Flexible and Simple Library for Learning Sequential Agents

reinforcement learning of collaboration

https://blog.openai.com/learning-to-cooperate-compete-and-communicate/?source=hn

datasets

https://deepmind.com/research/open-source/open-source-datasets/
https://opendatacommons.org/licenses/odbl/
http://studyforrest.org
http://blog.otoro.net/2017/11/12/evolving-stable-strategies/ mentions OpenAI? Gym, Roboschool, Pybullet
https://github.com/yandex/YaLM-100B

news

what works well where

linear/logistic regression vs ANNs vs SVMs
in the NetFlix contest
"Neural network models are highly expressive and flexible, and if we are able to find a suitable set of model parameters, we can use neural nets to solve many challenging problems....However, there are many problems where the backpropagation algorithm cannot be used. For example, in reinforcement learning (RL) problems, we can also a train a neural network to make decisions to perform a sequence of actions to accomplish some task in an environment. However, it is not trivial to estimate the gradient of reward signals given to the agent in the future to an action performed by the agent right now, especially if the reward is realised many timesteps in the future. Even if we are able to calculate accurate gradients, there is also the issue of being stuck in a local optimum, which exists many for RL tasks. A whole area within RL is devoted to studying this credit-assignment problem, and great progress has been made in recent years. However, credit assignment is still difficult when the reward signals are sparse. In the real world, rewards can be sparse and noisy. Sometimes we are given just a single reward, like a bonus check at the end of the year, and depending on our employer, it may be difficult to figure out exactly why it is so low. For these problems, rather than rely on a very noisy and possibly meaningless gradient estimate of the future to our policy, we might as well just ignore any gradient information, and attempt to use black-box optimisation techniques such as genetic algorithms (GA) or ES...OpenAI? published a paper called Evolution Strategies as a Scalable Alternative to Reinforcement Learning where they showed that evolution strategies, while being less data efficient than RL, offer many benefits. The ability to abandon gradient calculation allows such algorithms to be evaluated more efficiently. It is also easy to distribute the computation for an ES algorithm to thousands of machines for parallel computation. By running the algorithm from scratch many times, they also showed that policies discovered using ES tend to be more diverse compared to policies discovered by RL algorithms...Although ES might be a way to search for more novel solutions that are difficult for gradient-based methods to find, it still vastly underperforms gradient-based methods on many problems where we can calculate high quality gradients...CMA-ES is my algorithm of choice when the search space is less than a thousand parameters. I found it still usable up to ~ 10K parameters if I’m willing to be patient....I use PEPG if the performance of CMA-ES becomes an issue. I usually use PEPG when the number of model parameters exceed several thousand." [6]
on fitness shaping, an optional add-on to evolutionary learning: "I find fitness shaping to be very useful for RL tasks if the objective function is non-deterministic for a given policy network, which is often the cases on RL environments where maps are randomly generated and various opponents have random policies. It is less useful for optimising for well-behaved functions that are deterministic, and the use of fitness shaping can sometimes slow down the time it takes to find a good solution...." [7]
"One thing I don't see mentioned here is flexibility. You can plug any ugly old thing into an evolutionary algorithm and let it run. Give me a working computational model at breakfast time and I'll have runs going by lunch. Cranky ancient FORTRAN that requires you to write a new input file for every evaluation? No problem. You have to compile the inputs into the model to make it run? Fine. Badly scaled inputs or outputs? EA doesn't care. More than one objective? Great! Population-based search is a natural fit for multiple objective optimization. As long as you're clear on what the decisions and objectives are, and you're able to run the model yourself, layering evolutionary optimization on top is easy." [8]
- "I think the multiobjective-part is very important. In other algorithms, you often have to specify the solution-space before hand. For instance, when optimizing between lightweight and strength, one would have to beforehand say how to weight those two properties. (f = 100s - 5w for instance). This throws away a whole dimension in your search space. For EAs, you can let it roam free, and select the bests tradeoffs from the pareto-set after the algorithm is done." [9]
in collaborative filtering, "SVD can struggle when some users have many more “likes” than others" instead try "multi-step co-occurrence" (see https://www.quora.com/What-is-a-co-occurrence-matrix ) -- [10]
Classifier Technology and the Illusion of Progress
- (my note: in that paper, linear discriminant analysis is given as an example of a simple classifier which is often quite good; also, decision trees with low numbers of leaves, and neural nets with low numbers of hidden nodes, nearest neighbor)
Why do tree-based models still outperform deep learning on tabular data?
- https://news.ycombinator.com/item?id=32333565

natural language processing (NLP)

https://tomassetti.me/guide-natural-language-processing/
"Yeah, when you look at some of the SemEval? contest winners or top 3, many use fairly simple methods combined into a powerful solution (except when LSTM with attention grabs the throne). " [11]
"gensim is one of the best libraries for word vectors and summarization. For parsing and NER, Stanford CoreNLP? works best in my experience. " [12]

explainable machine learning

concept whitening

https://bdtechtalks-com.cdn.ampproject.org/v/s/bdtechtalks.com/2021/01/11/concept-whitening-interpretable-neural-networks/amp/?amp_js_v=a6&amp_gsa=1&usqp=mq331AQFKAGwASA%3D#csi=0&referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fbdtechtalks.com%2F2021%2F01%2F11%2Fconcept-whitening-interpretable-neural-networks%2F

misc tips

How to avoid machine learning pitfalls: a guide for academic researchers

case studies /examples / instances

how adversarial attacks work
http://blog.otoro.net/2017/11/12/evolving-stable-strategies/ mentions OpenAI? Gym, Roboschool, Pybullet
https://medium.com/@ageitgey/how-to-break-a-captcha-system-in-15-minutes-with-machine-learning-dbebb035a710
https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/
https://paperswithcode.com/sota
- https://news.ycombinator.com/item?id=19054501
graph neural networks: https://deepmind.com/blog/article/traffic-prediction-with-advanced-graph-neural-networks
muzero: https://arxiv.org/pdf/1911.08265.pdf
https://cp4space.hatsya.com/2021/01/08/the-neural-network-of-the-stockfish-chess-engine/
- https://news.ycombinator.com/item?id=25759430
https://github.com/kingoflolz/mesh-transformer-jax
https://www.sciencealert.com/a-new-neural-network-solved-the-hardest-of-maths-problems-a-million-times-faster (reservoir computing for fluid dynamics)
GameTune: Introducing reinforcement learning for optimizing the player lifecycle
Where2Act: From Pixels to Actions for Articulated 3D Objects (affordances)
DALL·E: Creating Images from Text
Visual number sense in untrained deep neural networks
Player of Games "unifies...approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning"
- https://github.com/captn3m0/boardgame-research

Supervised learning general

https://eugeneyan.com/writing/bootstrapping-data-labels/#weak-supervision-get-lots-of-lower-quality-labels-fast

Misc

http://strata.oreilly.com/2014/03/crowdsourcing-feature-discovery.html
How is Compressed Sensing going to change Machine Learning ?
https://en.wikipedia.org/wiki/Granular_computing
Deep Neural Decision Forests; recc. by [13] in the Hacker News discussion of How to trick a neural network into thinking a panda is a vulture; summary from [14]: in order to "to try to enhance the distance to the decision boundary (such as the adversarial examples referenced in the post)...replacing pooling layers in the convolutional neural net with something analogous random forests might be the best option so far. More or less each decision tree will end up pushing values to 0/1 in the non-linear region, which mitigates some of the concern about overly linear systems and it places the decision boundary at a somewhat arbitrary location between classes. In aggregate when these locations are combined the resulting classifier has a larger margin without explicitly needing adversaries. So instead of sampling the image space, you're effectively sampling the classifier space."
https://news.ycombinator.com/item?id=13571972
Interaction Networks for Learning about Objects, Relations and Physics
http://www.mit.edu/~ilkery/
https://github.com/eriklindernoren/ML-From-Scratch
- https://news.ycombinator.com/item?id=13732288
https://github.com/rushter/MLAlgorithms
CS 20SI: Tensorflow for Deep Learning Research
- https://news.ycombinator.com/item?id=13782486
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0
https://github.com/DanburyAI/SG_DLB_2017
https://eugeneyan.com/writing/first-rule-of-ml/
Machine Learning: The High Interest Credit Card of Technical Debt <-- oft-recommended
https://news.ycombinator.com/item?id=13773127
PCA visualization: http://setosa.io/ev/principal-component-analysis/
- https://news.ycombinator.com/item?id=14405665
model order selection: https://news.ycombinator.com/item?id=14406077
Variational Inference for Machine Learning
- https://news.ycombinator.com/item?id=12868741
https://github.com/blei-lab/edward
http://mc-stan.org/
https://github.com/pymc-devs/pymc3
https://news.ycombinator.com/item?id=12871174
https://news.ycombinator.com/item?id=12871246
The Confusion of Variational Autoencoders
Illuminating search spaces by mapping elites
- some discussion: [15]
http://eplex.cs.ucf.edu/noveltysearch/userspage/
https://blog.piekniewski.info/2016/11/04/predictive-vision-in-a-nutshell/
"Indirect encodings within neuroevolution (focusing now on encoding ANNs in particular) have made it possible to evolve much larger ANNs than with direct encodings like NEAT. One of the most popular such indirect encodings is called compositional pattern-producing networks (CPPNs), which was invented in my lab, the Evolutionary Complexity Research Group at the University of Central Florida, in response to the limitations of NEAT as a direct encoding. A CPPN is basically a way of compressing a pattern with regularities and symmetries into a relatively small set of genes. This idea makes sense because natural brains exhibit numerous regular patterns (i.e., repeated motifs) such as in the receptive fields in the visual cortex. CPPNs can encode similar kinds of connectivity patterns.

When CPPNs are used to generate the connectivity patterns of evolving ANNs, the resulting algorithm, also from my lab, is called HyperNEAT (Hypercube-based NEAT, co-invented with David D’Ambrosio? and Jason Gauci) because under one mathematical interpretation, the CPPN can be conceived as painting the inside of a hypercube that represents the connectivity of an ANN. Through this technique, we began to evolve ANNs with hundreds of thousands to millions of connections. Indirectly encoded ANNs have proven useful, in particular, for evolving robot gaits because their regular connectivity patterns tend to support the regularity of motions involved in walking or running. Researchers like Jeff Clune have helped to highlight the advantages of CPPNs and HyperNEAT? through rigorous studies of their various properties. Other labs also explored different indirect encoding in neuroevolution, such as the compressed networks of Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez."

"Novelty search...This insight shifted how we think about neuroevolution once again, and has led to a whole new research area called “quality diversity” and sometimes “illumination algorithms.” This new class of algorithms, generally derived from novelty search, aims not to find a single optimal solution but rather to illuminate a broad cross-section of all the high-quality variations of what is possible for a task, like all the gaits that can be effective for a quadruped robot. One such algorithm, called MAP-Elites (invented by Jean-Baptiste Mouret and Jeff Clune), landed on the cover of Nature recently (in an article by Antione Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret) for the discovery of just such a large collection of robot gaits, which can be selectively called into action in the event the robot experiences damage...Quality diversity algorithms (the next step beyond novelty search) add the notion of fitness back in, but in a careful way so as not to undermine the delicate push for novelty."
Open-Ended Evolution workshop: http://www.mitpressjournals.org/doi/abs/10.1162/ARTL_a_00210 "the idea of evolving increasingly complex and interesting behaviors without end. Many regard evolution on Earth as open-ended, and the prospect of a similar phenomenon occurring on a computer offers its own unique inspiration."
Survival of the sparsest: robust gene networks are parsimonious

moultano 5 hours ago [-]

A practical issue for Naive Bayes that also infects linear models is bias w.r.t. document length. Typically when you are detecting a rare, relatively compact class such as sports articles (or spam) you will tend to have a strongly negative prior, many positive features, and few negative ones. As a consequence, as the length of your text increases, not only does the variance of your prediction increase, but the mean tends to as well. This leads to all very long documents being classified as positive, regardless of their text. You can observe this by training your model and then classifying /usr/dict/words.

This is the most common mistake I've seen in production use of linear models on document text. Invariably, they'll misfire on any unusually long document.

mooman219 3 hours ago [-]

I agree, there are issues with NB such as the ones you brought up. I don't think document length is the real offender here though. This really boils down to noise and how well you filter and devalue it. Stacking more filters like stemming, stopwords, and high frequency features definitely helps in this case to the point where longer documents can actually improve accuracy. Additionally, tuning your ngram lengths or using variable lengths, choosing between word or character ngrams, and limiting your distribution size all will help depending on what you're trying to categorize.

intune 4 hours ago [-]

Is there some way to normalize the document length?

moultano 4 hours ago [-]

Lots of reasonable hacks.

1. Use only the beginning of the document, as that's probably the most important part anyways, and it's fast.

2. Divide the sum of your feature scores by sqrt(n) to give it constant variance, and hopefully keep it comparable with your prior.

3. Split the doc into reasonably sized chunks, and average their scores rather than adding them.

Houshalter 3 hours ago [-]

You can use term frequency instead of binary features. This is invariant to the size of the document. This is called multinomial naive Bayes: https://en.m.wikipedia.org/wiki/Naive_Bayes_classifier#Multi...

moultano 2 hours ago [-]

This is not invariant to the size of the document (though agreed, generally better). It doesn't solve the problem of having mostly positive features and a negative prior.

Stated more formally, your model is b + wᵀx. Generally, b is < 0, and E[wᵀx] > 0. As the document grows, wᵀx tends to dominate b. You'll have bias with length as long as E[wᵀx]≠0 and there aren't any constraints on w that would force this.

reply "

" There is a temptation to use just the word pair counts, skipping SVD, but it won't yield in the best results. Creating vectors not only compresses data, but also finds general patterns. This compression is super important for less frequent words (otherwise we get a lot of overfitting). See "Why do low dimensional embeddings work better than high-dimensional ones?" from http://www.offconvex.org/2016/02/14/word-embeddings-2/. "

maps and glossaries

meta learning deep learning architectures

https://medium.com/intuitionmachine/machines-that-search-for-deep-learning-architectures-c88ae0afb6c8

standard datasets and tasks and benchmarks and contests

machine learning for math

https://www-livescience-com.cdn.ampproject.org/v/s/www.livescience.com/amp/ramanujan-machine-created.html?amp_js_v=a6&amp_gsa=1&usqp=mq331AQFKAGwASA%3D#aoh=16133612956000&csi=0&referrer=https%3A%2F%2Fwww.google.com&amp_tf=From%20%251%24s&ampshare=https%3A%2F%2Fwww.livescience.com%2Framanujan-machine-created.html

toread for me

andy99 2 days ago [–]

Uncertainty quantification and OOD detection in machine learning. It's on some people's radar, but has the potential to get ML adopted much more widely as people understand what it is actually really good at, and stop giving it things to do that it's bad at.

For a great recent example that get at some of this, see "Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions" - https://arxiv.org/abs/2104.03829

I'm not affiliated with this work but I am building a company in this area (because I'm excited). Company is in my profile.

---

https://arxiv.org/abs/2011.11082 Massively Parallel Causal Inference of Whole Brain Dynamics at Single Neuron Resolution mpEDM Empirical Dynamic Modeling (EDM)

---

http://graphics.stanford.edu/courses/cs468-20-fall/schedule.html non-Euclidean machine learning

http://web.stanford.edu/class/cs224w/ CS224W: Machine Learning with Graphs

http://graphics.stanford.edu/courses/cs233-21-spring/ Geometric and topological data analysis

https://www-users.cs.umn.edu/~saad/PDF/umsi-2009-31.pdf TRACE OPTIMIZATION AND EIGENPROBLEMS IN DIMENSIONREDUCTION METHODS

---

machine scientific discovery

https://scitechdaily.com/artificial-intelligence-discovers-alternative-physics/

Discovering faster matrix multiplication algorithms with reinforcement learning

---

https://github.com/ctgk/PRML Python codes implementing algorithms described in Bishop's book "Pattern Recognition and Machine Learning"

---

https://www.marktechpost.com/2022/12/01/latest-ai-research-finds-a-simple-self-supervised-pruning-metric-that-enables-them-to-discard-20-of-imagenet-without-sacrificing-performance-beating-neural-scaling-laws-via-data-pruning/

---

https://www.google.com/search?q=expert+exponential+learning+weights+ensemble

---

the transformer paper: https://arxiv.org/abs/1706.03762 Attention Is All You Need

---

notes-cog-ai-machineLearning-machineLearningNotes

Intro

Books

Survey

CCM

Tree-based methods

Topological data analysis

Features

What works well where, and lists of top algorithms

Test sets

Tools

Vision tools

Books

Courses

Contests

Case studies / examples

Reinforcement learning case studies

SVMs

time series

EDM

NLP techniques

Tips

Idea

Intros to and notes on classic algorithms

Reviews

games

deep learning

KR

evolution strategies

imitation learning

reinforcement learning

RL tools

reinforcement learning of collaboration

datasets

news

what works well where

natural language processing (NLP)

explainable machine learning

concept whitening

misc tips

case studies /examples / instances

Supervised learning general

Misc

maps and glossaries

meta learning deep learning architectures

standard datasets and tasks and benchmarks and contests

machine learning for math

Links

toread for me