Humanoid¶
A 3D 21-DoF humanoid with full upper- and lower-body actuation. Three variants share the body and dynamics; they differ only in the target locomotion speed baked into the reward — a stationary stand, a walking gait, and a running gait.
HumanoidStand¶

| Property | Value |
|---|---|
| Canonical ID | mjx/humanoid_stand-v0 |
| Action space | Box(-1.0, 1.0, (21,), float32) |
| Observation space | Box(-inf, inf, (67,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.025, "sim_dt": 0.005, "naconmax": 200_000, "njmax": 250} |
Description¶
The humanoid has to stand upright and stationary, keeping its head above a minimum standing height. With 21 actuators and full body articulation, the challenge isn't holding a pose — it's coordinating the full kinematic chain to stay balanced while keeping horizontal velocity at zero.
Rewards¶
Uses a dense reward that multiplies three terms — a stand reward, a "don't move" reward, and a small-control penalty:
| Python | |
|---|---|
1 2 3 4 5 6 | |
Each component captures a separate concern:
standing—toleranceon head height:1.0once the head clearsSTAND_HEIGHT, decaying smoothly as it sinks.upright—toleranceon torso vertical alignment:1.0oncetorso_upright >= 0.9, ramping linearly to0.0as the torso tilts toward horizontal.dont_move— penalises horizontal centre-of-mass velocity;1.0near zero, decays away from rest.small_control— quadratic action penalty rescaled into[0.8, 1.0], so it lightly modulates the primary terms rather than dominating.
Starting state¶
1 2 3 4 5 6 7 8 9 | |
(67-dim observation: joint positions, joint velocities, end-effector positions, sensor readings — humanoid initialised in a default standing posture.)
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination on falling.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/humanoid.py.
HumanoidWalk¶

| Property | Value |
|---|---|
| Canonical ID | mjx/humanoid_walk-v0 |
| Action space | Box(-1.0, 1.0, (21,), float32) |
| Observation space | Box(-inf, inf, (67,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.025, "sim_dt": 0.005, "naconmax": 200_000, "njmax": 250} |
Description¶
The same body and dynamics as HumanoidStand, now walking forward at a target horizontal speed. Standing stably is no longer enough — the agent has to coordinate the full kinematic chain into a forward gait while still keeping the head above the standing height and the torso roughly vertical.
Rewards¶
Uses the same dense reward shape as HumanoidStand, with the dont_move term replaced by a move_reward:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Each component captures a separate concern:
standing—toleranceon head height:1.0once the head clearsSTAND_HEIGHT, decaying smoothly as it sinks.upright—toleranceon torso vertical alignment:1.0oncetorso_upright >= 0.9, ramping linearly down as the torso tilts.move_reward— linear ramp from0.0(at zero speed) to1.0(atWALK_SPEEDor above), then rescaled into[0.17, 1.0]so the gradient survives even slow movement.small_control— quadratic action penalty rescaled into[0.8, 1.0], lightly modulating the primary terms rather than dominating.
Starting state¶
1 2 3 4 5 6 7 8 9 | |
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination on falling.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/humanoid.py.
HumanoidRun¶

| Property | Value |
|---|---|
| Canonical ID | mjx/humanoid_run-v0 |
| Action space | Box(-1.0, 1.0, (21,), float32) |
| Observation space | Box(-inf, inf, (67,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.025, "sim_dt": 0.005, "naconmax": 200_000, "njmax": 250} |
Description¶
The same body and dynamics as HumanoidStand, now running forward. Only the target speed in the move term changes from WALK_SPEED to RUN_SPEED, but the consequence is qualitatively different — at running speeds the humanoid has to commit to a more aggressive gait that briefly leaves the ground, which is noticeably harder than the walking variant despite the identical body.
Rewards¶
Uses the same dense reward shape as HumanoidWalk, with RUN_SPEED replacing WALK_SPEED:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Each component captures a separate concern:
standing—toleranceon head height:1.0once the head clearsSTAND_HEIGHT, decaying smoothly as it sinks.upright—toleranceon torso vertical alignment:1.0oncetorso_upright >= 0.9, ramping linearly down as the torso tilts.move_reward— linear ramp from0.0(at zero speed) to1.0(atRUN_SPEEDor above), rescaled into[0.17, 1.0]so the gradient survives even slow movement.small_control— quadratic action penalty rescaled into[0.8, 1.0], lightly modulating the primary terms rather than dominating.
Starting state¶
1 2 3 4 5 6 7 8 9 | |
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination on falling.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/humanoid.py.