Skip to content

Swimmer

A multi-link planar swimmer in a viscous fluid. Mujorax ships the canonical six-link variant.

SwimmerSwimmer6

SwimmerSwimmer6

Property Value
Canonical ID mjx/swimmer_swimmer6-v0
Action space Box(-1.0, 1.0, (5,), float32)
Observation space Box(-inf, inf, (25,), float32)
Episode length 1000
Config {"ctrl_dt": 0.03, "sim_dt": 0.003, "naconmax": 0, "njmax": 40}

Description

Six articulated links connected by five actuators. The swimmer must move its head toward a randomised target via undulatory motion through the surrounding fluid. Direct propulsion isn't possible — every metre of travel comes from coordinated bending across the body, which is what makes the env an interesting test case despite the simple top-level objective.

Rewards

Uses a dense reward with a tolerance indicator over nose-to-target distance, switched to the long_tail sigmoid so the gradient stays gentle at large distances:

Python
1
2
3
4
5
6
reward = tolerance(
    distance(nose, target),
    bounds=(0, target_size),
    margin=5 * target_size,
    sigmoid="long_tail",
)

The long_tail sigmoid changes how tolerance decays. The indicator returns:

  • 1.0 once the nose is within target_size of the target.
  • A long, gentle decay from 1.0 toward zero over the next 5 * target_size of distance.
  • A small but non-zero value well past the margin (the long tail), so distant policies still get a usable gradient.

Starting state

1
2
3
4
obs = [ 0.527   0.6077 -0.6753 -0.6334 -0.7211 -1.4088 -0.3333  0.
        0.      0.      0.      0.      0.      0.      0.      0.
        0.      0.      0.      0.      0.      0.      0.      0.
        0.    ]

(joint angles followed by joint velocities — body initialised in a slight curl with zero velocity.)

Termination

Episode ends when step >= max_steps (default 1000). No early termination.

Usage

Python
1
2
import envrax
env = envrax.make("mjx/swimmer_swimmer6-v0")

Reference

Upstream: mujoco_playground/_src/dm_control_suite/swimmer.py.