Neural networks mystify me

Apr 15th, 2023

AI

Spent a good chunk of a week off (re)learning the basics of neural networks: forward propagation, gradient descent, loss functions. It’s taking some time, but gradually I feel some level of understanding is taking hold. What kind of continues to amaze me is how simple it all is: you really only need a high-school level understanding of linear algebra and calculus to understand most of what’s going on behind the scenes. Best as I can tell, the recent innovations in the last few years (in particular transformer models behind things like ChatGPT) are just refinements on top of these basic concepts.

Neural networks (there really there is nothing “neural” about them) are really not new: I remember hearing about that as an undergraduate in the early 2000s (and I think they were rather old hat even then). At the time, they were pretty much dismissed as a warmed-over model of behaviorism, unlikely to be useful anywhere except perhaps in a few simplistic applications. Based on what I saw at the time, I agreed and basically bought into the idea that computers are mainly useful as an adjunct to human processes, systems and intuition.

Thus, I find the fact that these systems can produce something even mildly resembling novel or creative outputs (as is the case with things like ChatGPT and Midjourney) surprising - as in it wasn’t something I saw coming. Yes, much of what has been built using these technologies is overhyped and arguably dangerous. Still, I also don’t want to lose the sense of wonder that this is possible at all. If I was mistaken about this what else might I be missing?

I feel like the best response at this point is to take a step back, learn as much as I can, and then develop an opinion. I expect this process to take at least a year, probably longer.

In case it’s helpful to others, here’s some literature I’ve been working through on these topics.

Some theoretical but approachable material for understanding the basics:

On the ethical side:

And some articles on how to think about LLMs from a pragmatic perspective as a programmer: