How might LLMs store facts | Chapter 7, Deep Learning
Unpacking the multilayer perceptrons in a transformer, and how they may store facts.
After a bit of a delay, the next chapter on transformers is here. The topic at hand is to go through what happens in the multilayer perception blocks. These are much simpler than the attention layers, so in principle, this could be a 2-minute video.
However, understanding what these blocks are doing, not merely in the sense of “what computations happen?”, but at the level of “what, conceptually, do they do to the information flowing through the network?” is exceedingly challenging. We know that (at least some) factual knowledge tends to live in these blocks, and that seemed to me like a great way to motivate their structure.
This topic also proved to be a great opportunity to outline the idea of the superposition hypothesis, which comes near the end of the video and involves one of my new favorite counterintuitive facts about higher dimensions.
Thanks for new video man.
Keep up the good work man.