HexHowells

Understanding Moravec's Paradox

Aug 17 2025


Morevec's paradox is a little weird in a few ways. First it's not a paradox, and second it's widely miss-interpreted. At its core, Moravec's paradox is the observation that reasoning takes much less computation compared to sensorimotor and perception tasks. It's often (incorrectly) described as tasks that are easy for humans are difficult for machines and visa versa.

The answer from the human's side is relatively simple to explain. As hypothesised by Morvec, humans have evolved to be good at tasks that benefit them in survival, such as fine motor control and vision. But this doesn't really explain why machines find certain problems easy or difficult, which is the part of this observation that I want to focus on here.

The Key idea

I believe that this observation can be broken down into two components: search space and reward sparsity. In general, ignoring humans or machines, problems are more difficult if their search space is large, and reward signals are sparse.

We can take chess as an example. Something that is difficult for humans, since we never evolved to play chess or quite reason in that way. However, machines excel at playing chess. When viewed through the lens of search and rewards, this becomes more clear. Firstly, the average number of moves in a chess game is around 40, and the average branching factor (possible legal moves per state) is 35. Whilst this still is a large search space, compared to other tasks this is relatively small. Additionally, rewards are quite common, either produced via an evaluation function (not perfect) or by waiting until a terminal state is reached.

Now compare this to a task such as robotics. A bipedal robot will have various actuators for each limb (say 3 per limb, many more for hands/fingers), each actuator can move between 50 to a few thousand times per second. Not only is the action space large, but the environment which the robot operates in is complex, and it could take tens of thousands of steps to gain a single reward (say folding a single item of clothing).

So how did humans solve this problem? Simple, we did it via search as well, with an algorithm called evolution and a reward signal called natural selection. It took around 4 billion years worth of search to get where we are now, but we got there eventually.

Dreaming of Search

Another important aspect of search is the ability to look ahead in time. We are not limited to using search just to learn, we can also use it to inform our actions. In chess, we can use a policy to generate potential chess moves, and simulate them in a board that looks identical to the actual chess board. We can't quite simulate the opponent, but we can assume they will take a close to maximal policy, and if they don't then they are likely going to lose anyway. But the key point here is that we can perfectly simulate the board state.

Compare this to robotics, before the robot moves it's actuators, can we simulate what will happen? Well we could run a physics simulation, but trying to capture all the details of the current environment is just not going to happen given how complex the world is. Which leaves us with two options, running a simulation in embedding space, and not running a simulation at all.

Humans do this pretty well, we have good models of the world and can even simulate certain situations when we dream. Of course we gained this ability thanks to evolution, clearly understanding how actions can impact future states is important for survival.

Neural Networks

We can see these principles throughout the field of deep learning. Gradient descent is a search problem, but its a much easier search problem then that of say reinforcement learning (I know this also uses GD! But the actual loss is rarely known). In gradient descent, the input is known, the output is known, all that is required is a mapping between the input and output. It can get more difficult when the model needs to map a single input to various outputs, such as via autoregressive text generation (e.g. "my dog is "; there are multiple correct answers).

However, LLMs have been quite successful over the past few years, but we can also understand this success via these core principles. LLMs have a fixed number of tokens to generate (decreasing the search space) and can have direct feedback on each token generated (whether it was predicted correctly). Once we pre-train an LLM on bulk-text, we can now increase the search space and reward sparsity via fine-tuning. Take RLHF, the network now has a larger search space, since each token that is predicted is actually used as context, and the model doesn't know if the text it's producing is good until it has finished and is evaluated. But since we've greatly pruned our search tree with pre-training, the search space is actually quite small (the LLM already knows roughly how to write good text, instead of outputting random tokens). This observation also aligns with Yann LeCuns Cake.

Reinforcement learning is also a search problem, given an environment and a set of possible actions, reach a certain goal that maximises some reward. However, there is no directly obvious path to reach that goal, search spaces can be large and rewards sparse as most actions don't yield a reward. As seen in the field, RL tasks that have both of these properties struggle to converge at all without help (pre-training, simulators, etc). obviously RL can be applied to simple environments such as Atari, but even this could use improvement.

Modelling Hard Problems

Given all this then, we can derive something quite useful from Moravec's Paradox, beyond the large miss-interpretation of "what humans find easy, machines will likely find hard". Instead we can measure how complex a task will be for a neural network by looking at how large the search space is and how sparse the reward signals are.

As such, we can predict that certain unsolved tasks will be quite easy to solve once we have enough data to train on. An example of this includes various biological tasks, which are currently quite limited by data and how general training data currently is. Whereas other tasks, such as in robotics (cleaning a house, washing dishes, fixing a car, etc), or even long-horizon reasoning tasks (coding an entire software library from scratch, beating a long 3D game, etc), will prove to be difficult.

Currently it seems the only available solutions to these hard problems are either: run a search algorithm for a long time (such as with evolution), or figure out an instrumental goal that has a smaller search space and more sparse rewards, then tackle the more complex task downstream with something like RL (such as with LLMs).