1 A 3 O R N

LMGTFY v RTFM and AlphaStar

Created: 2020-07-26
Wordcount: 0.4k

RTFM means "read the fucking manual." LMGTFY means "let me google that for you." Both are responses typically given to questions whose content indicates that the querent has not bothered to do a minimal search for the answer.

Both are a little impolite, more apt for mental than actual exclamation. But what interests me is the philosophy behind each of them.

RTFM was first used in the late 70s and early 80s. LMGTFY apparently came into use in 2009.

The vision of knowledge behind RTFM is ordered and causal. You learn about the system you are using. This knowledge lets you understand what you are doing wrong. You then solve the problem.

The vision of knowledge behind RTFM is unordered and based of pattern-matching. Google gives us access to everyone else who has encountered a similar problem. One of these problems is almost certainly the same as yours, or almost the same, and can therefore let you solve the problem.

Put the difference in terms of reinforcement learning.

Both OpenAI Five and DeepMind's AlphaStar used model-free reinforcement learning to solve problems that has previously seemed to require model-based reinforcement learning.

That is, you would typically think that playing StarCraft II at a high level would involve thinking about what your opponent knows, does not know, and suspects; you would think it would involve predicting what your opponent would do, and forestalling this and preparing for it; you would think it would involve modeling your opponent.

The basic algorithm used by AlphaStar does no such thing. It is incapable of modeling an opponent. It anticipates absolutely nothing. It pictures nothing about what the opponent knows. Nevertheless, it is able to act as if it could predict what an opponent would do. It can do this because, while AlphaStar was being trained, it competed in a league with hundreds of other agents, trying to beat them all. It learned the things to do, when faced with a particular agent's intention, without ever learning to decipher that intention.

(Even after competing against hundreds of different agents for centuries of training time, one should note, it remained somewhat brittle against new strategies.)

LMGTFY reflects such a model-free vision of learning. You can encounter a problem for it, and immediately tap into the AlphaStar-like centuries of acquired experience of others. Just as AlphaStar acquires a reflex for solving a problem it does not understand, LMGTFY is a way of reflexively solving a problem that one does not understand, at least frequently.