Acrobot¶

This family focuses on two-link underactuated pendulum's mounted on a fixed pivot. The agent torques the second joint only, where the first joint swings passively.

All Acrobot variants share the same body, dynamics, and termination rule; they only differ via the reward function.

AcrobotSwingup¶

AcrobotSwingup

Property	Value
Canonical ID	`mjx/acrobot_swingup-v0`
Action space	`Box(-1.0, 1.0, (1,), float32)`
Observation space	`Box(-inf, inf, (6,), float32)`
Episode length	1000
Config	`{"ctrl_dt": 0.01, "sim_dt": 0.01, "naconmax": 0, "njmax": 0}`

Description¶

The pendulum starts at rest, hanging straight down. The agent must swing the tip up to a target position. Only the second joint is actuated, so the tip can't be lifted directly — momentum has to build up through the underactuated dynamics.

Rewards¶

Uses a dense reward with a smooth tolerance indicator over tip-to-target distance:

Python
reward = tolerance(
    distance(tip, target),
    bounds=(0.0, big_target_radius),
    margin=big_target_radius,
)

tolerance is DM Control's smooth indicator. It returns:

1.0 while the input sits inside bounds.
A smooth decay to value_at_margin as the input moves out to margin.
0.0 past that.

Starting state¶

1	`obs = [-0.5403 0.2908 -0.8415 -0.9568 0. 0. ]`

(orientations of the two links followed by their angular velocities; both joints at rest.)

Termination¶

Episode ends when step >= max_steps (default 1000). No early termination.

Usage¶

Python
import envrax
env = envrax.make("mjx/acrobot_swingup-v0")

Reference¶

Upstream: mujoco_playground/_src/dm_control_suite/acrobot.py.

AcrobotSwingupSparse¶

AcrobotSwingupSparse

Property	Value
Canonical ID	`mjx/acrobot_swingup_sparse-v0`
Action space	`Box(-1.0, 1.0, (1,), float32)`
Observation space	`Box(-inf, inf, (6,), float32)`
Episode length	1000
Config	`{"ctrl_dt": 0.01, "sim_dt": 0.01, "naconmax": 0, "njmax": 0}`