Pendulum¶
A single-link pendulum hanging from a fixed pivot.
PendulumSwingup¶

| Property | Value |
|---|---|
| Canonical ID | mjx/pendulum_swingup-v0 |
| Action space | Box(-1.0, 1.0, (1,), float32) |
| Observation space | Box(-inf, inf, (3,), float32) |
| Episode length | 1000 |
| Config | {"ctrl_dt": 0.02, "sim_dt": 0.01, "naconmax": 0, "njmax": 0} |
Description¶
The pendulum starts hanging straight down. The agent applies torque at the pivot to swing it up to and balance it at the upright position. Available torque is small enough that the pendulum can't be lifted in a single push — energy has to build up through several swings before it crosses the top, which is what makes the task interesting despite the simple body.
Rewards¶
Uses a dense reward with a tolerance indicator over the pole's vertical alignment:
| Python | |
|---|---|
1 | |
pole_vertical is the z-component of the pole's frame z-axis (i.e. cos(angle_from_upright)), and COSINE_BOUND marks the lower edge of the upright tolerance band. tolerance is DM Control's smooth indicator. It returns:
1.0whilepole_verticalsits inside(COSINE_BOUND, 1.0)(pole near upright).- A smooth decay below
COSINE_BOUNDas the pole tilts away from vertical. 0.0when the pole points straight down.
Starting state¶
1 | |
(cos(angle), sin(angle), angular velocity — pole near hanging-down with a small randomisation.)
Termination¶
Episode ends when step >= max_steps (default 1000). No early termination.
Usage¶
| Python | |
|---|---|
1 2 | |
Reference¶
Upstream: mujoco_playground/_src/dm_control_suite/pendulum.py.