Walker¶
A planar bipedal walker. Three variants share the body and dynamics; they differ only in the target locomotion speed baked into the reward — a stationary stand, a walking gait, and a running gait.
WalkerStand¶

| Property | Value |
|---|---|
| Canonical ID | mjx/walker_stand-v0 |
| Action space | Box(-1.0, 1.0, (6,), float32) |
| Observation space | Box(-inf, inf, (24,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.025, "sim_dt": 0.0025, "naconmax": 50_000, "njmax": 100} |
Description¶
The walker has to stand upright and stationary, keeping its torso above a minimum standing height. Without a forward-velocity term, the agent isn't pushed to move — it just has to keep the body tall and aligned vertically.
Rewards¶
Uses a dense reward that combines a standing-height tolerance with torso-uprightness, weighted 3:1 in favour of standing height:
| Python | |
|---|---|
1 2 3 4 5 6 7 | |
The two terms each capture a separate concern:
standing— the smoothtoleranceindicator on torso height:1.0once the torso clearsSTAND_HEIGHT, decaying smoothly as it sinks towardSTAND_HEIGHT / 2.upright— the dot product of the torso's up-axis with world-z, normalised to[0, 1].1.0when upright,0.5on its side,0.0upside-down.
Starting state¶
1 2 3 | |
(joint orientations and positions followed by joint velocities — body initialised in a default posture with zero velocity.)
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination on falling.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/walker.py.
WalkerWalk¶

| Property | Value |
|---|---|
| Canonical ID | mjx/walker_walk-v0 |
| Action space | Box(-1.0, 1.0, (6,), float32) |
| Observation space | Box(-inf, inf, (24,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.025, "sim_dt": 0.0025, "naconmax": 50_000, "njmax": 100} |
Description¶
The same body as WalkerStand, now walking forward at a target horizontal speed. The agent has to keep the torso tall and roughly vertical while moving — speed alone isn't enough if the walker collapses, and a stable stand isn't enough if it doesn't make progress.
Rewards¶
Uses a dense reward that multiplies WalkerStand's stand reward by a forward-velocity term:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 | |
Three components combined so neither tall-but-still nor fast-but-fallen is enough:
standing—toleranceon torso height:1.0once the torso clearsSTAND_HEIGHT, decaying smoothly as it sinks.upright— torso vertical alignment normalised to[0, 1].move_reward— linear ramp from0.5(at half target speed) to1.0(atWALK_SPEEDor above), then rescaled into[0.17, 1.0]via(5 * move_reward + 1) / 6so the stand reward stays the dominant factor.
Starting state¶
1 2 3 | |
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination on falling.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/walker.py.
WalkerRun¶

| Property | Value |
|---|---|
| Canonical ID | mjx/walker_run-v0 |
| Action space | Box(-1.0, 1.0, (6,), float32) |
| Observation space | Box(-inf, inf, (24,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.025, "sim_dt": 0.0025, "naconmax": 50_000, "njmax": 100} |
Description¶
The same body as WalkerStand, now running forward at a higher target horizontal speed than WalkerWalk. The faster cadence forces a more dynamic gait that briefly leaves the ground, which is qualitatively harder than walking despite the identical body — most policies that work for the walking variant don't transfer cleanly to running.
Rewards¶
Uses the same dense reward shape as WalkerWalk, with RUN_SPEED replacing WALK_SPEED:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 | |
Same three components as WalkerWalk, just at a higher target speed:
standing—toleranceon torso height:1.0once the torso clearsSTAND_HEIGHT, decaying smoothly as it sinks.upright— torso vertical alignment normalised to[0, 1].move_reward— linear ramp from0.5(at half target speed) to1.0(atRUN_SPEEDor above), then rescaled into[0.17, 1.0]via(5 * move_reward + 1) / 6so the stand reward stays the dominant factor.
Starting state¶
1 2 3 | |
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination on falling.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/walker.py.