TL;DR: Edit driving scenes with natural language and generate predictive simulations β€” from static road objects to dynamic pedestrians and vehicles.

Overview

SIMSplat is a framework for editing driving scenes with natural language prompts. Its key highlights are: (1) Motion-aware Gaussian Querying β€” temporal-language alignment enables precise querying of Gaussian objects based on their motion and location features. (2) Flexible Object Editing β€” the first simulator to support editing dynamic pedestrians for generating safety-critical cases. In addition to inserting new objects, existing objects can be modified to change speed, alter direction, stop, remove, or replace. (3) Multi-agent path refinement β€” predicts and adapts surrounding agents’ behaviors in response to edits, ensuring realistic and coherent scenarios. (4) Interactive editing β€” supports multi-turn conversations, enabling users to iteratively refine and control scenes.


Qualitative sample

Motion-aware 4D Gaussian Querying

Our temporal alignment enhances querying of dynamic road objects by incorporating motion and location language features. Unlike existing methods such as LangSplat and 4DLangSplat, our approach effectively captures agent behaviors (e.g., turning left or right, standing still, walking away, moving laterally) as well as relative locations (e.g., on the left or right side of the ego vehicle). This enables SIMSplat to determine where to insert new objects, where to generate path, or which existing objects to modify.

Qualitative sample

Predictive Scene Editing

SIMSplat enables intuitive editing of driving scenes through natural language prompts, supporting detailed modifications of existing objects, including pedestrians. Moreover, the multi-agent path refinement module predicts and applies the future trajectories of all road agents, ensuring that edited scenarios remain realistic and coherent.

Add a New Pedestrian

You can not only remove or adjust the speed of existing pedestrians, but also add new ones using real human assets obtained from other scenes. This enables the generation of safety-critical scenarios involving vulnerable road users.

"Add a pedestrian with pink jacket crossing the street from left to right"

GT
Front Left View Front View Front Right View
Edited
Front Left View Front View Front Right View

"Add a jaywalking pedestrian from (x1,y1,z1) to (x2,y2,z2)"

GT
Front Left View Front View Front Right View
Edited
Front Left View Front View Front Right View

"Add a pedestrian on the wheelchair crossing the crosswalk next to the pedestrian standing on the left"

β†’ Black convertible car on the left slows down to allow a pedestrian to cross safely.

GT
Front Left View Front View Front Right View
Edited
Front Left View Front View Front Right View

Modify Path of Object

You can modify the paths of objects, including vehicles and pedestrians, by making them turn left or right, accelerate or decelerate, stop, or continue straight. This accepts various action parameters, such as speed, direction, start time, relative distance, and start/end positions.

"Make a black car turning at the intersection to go straight"

β†’ Black car slows down and stop to allow a pedestrian to cross safely.

GT
Front Left View Front View Front Right View
Edited
Front Left View Front View Front Right View

"Make a vehicle moving forward from the opposite lane toward ego to slow down and stop"

β†’ The following grey car adjusts its path to avoid the stopped vehicle

GT
Front Left View Front View Front Right View
Edited
Front Left View Front View Front Right View

Add a static object

You can insert a variety of static objects, such as traffic lights, road barriers, trash containers, construction workers, bulldozers, or pedestrians in wheelchairs, and observe how they influence the overall traffic scene.

"Add a bulldozer 5m behind the black car crossing the street"

β†’ The red classic car stops in front of bulldozer to avoid collision

GT
Front Left View Front View Front Right View
Edited
Front Left View Front View Front Right View

Other Modifications

Other than these examples, you can insert new vehicles with designated behaviors, add following vehicles, replace existing ones, or remove specific objects. SIMSplat also allows control over asset size, rotation, offset, and other parameters, enabling seamless insertion and iterative adjustment through multi-turn interactions.

Pipeline

Our framework consists of four main stages. First, we train a scene-graph-based 4D Gaussian Splatting (4DGS) model to reconstruct the scene. Next, we perform Language-Gaussian Alignment by embedding appearance, motion, and location features into the Gaussians, enabling direct open-vocabulary querying of road objects. With this language-augmented scene, the LLM agent interprets user prompts to edit the environment, such as adding, removing, or modifying objects. The edits are then refined through a multi-agent path refinement module, which verifies the edited object and adjusts surrounding agents so that they respond naturally to the changes. Scene editing can be further controlled through multi-turn conversations after visualizing the rendered results. Finally, a diffusion-based inpainting model polishes the modified regions to ensure that the rendered outputs appear seamless and realistic.

Methodology