TL;DR: Edit driving scenes with natural language and generate predictive simulations β from static road objects to dynamic pedestrians and vehicles.
Overview
SIMSplat is a framework for editing driving scenes with natural language prompts. Its key highlights are: (1) Motion-aware Gaussian Querying β temporal-language alignment enables precise querying of Gaussian objects based on their motion and location features. (2) Flexible Object Editing β the first simulator to support editing dynamic pedestrians for generating safety-critical cases. In addition to inserting new objects, existing objects can be modified to change speed, alter direction, stop, remove, or replace. (3) Multi-agent path refinement β predicts and adapts surrounding agentsβ behaviors in response to edits, ensuring realistic and coherent scenarios. (4) Interactive editing β supports multi-turn conversations, enabling users to iteratively refine and control scenes.

Motion-aware 4D Gaussian Querying
Our temporal alignment enhances querying of dynamic road objects by incorporating motion and location language features. Unlike existing methods such as LangSplat and 4DLangSplat, our approach effectively captures agent behaviors (e.g., turning left or right, standing still, walking away, moving laterally) as well as relative locations (e.g., on the left or right side of the ego vehicle). This enables SIMSplat to determine where to insert new objects, where to generate path, or which existing objects to modify.

Predictive Scene Editing
SIMSplat enables intuitive editing of driving scenes through natural language prompts, supporting detailed modifications of existing objects, including pedestrians. Moreover, the multi-agent path refinement module predicts and applies the future trajectories of all road agents, ensuring that edited scenarios remain realistic and coherent.
Add a New Pedestrian
You can not only remove or adjust the speed of existing pedestrians, but also add new ones using real human assets obtained from other scenes. This enables the generation of safety-critical scenarios involving vulnerable road users.
"Add a pedestrian with pink jacket crossing the street from left to right"






"Add a jaywalking pedestrian from (x1,y1,z1) to (x2,y2,z2)"






"Add a pedestrian on the wheelchair crossing the crosswalk next to the pedestrian standing on the left"
β Black convertible car on the left slows down to allow a pedestrian to cross safely.






Modify Path of Object
You can modify the paths of objects, including vehicles and pedestrians, by making them turn left or right, accelerate or decelerate, stop, or continue straight. This accepts various action parameters, such as speed, direction, start time, relative distance, and start/end positions.
"Make a black car turning at the intersection to go straight"
β Black car slows down and stop to allow a pedestrian to cross safely.






"Make a vehicle moving forward from the opposite lane toward ego to slow down and stop"
β The following grey car adjusts its path to avoid the stopped vehicle






Add a static object
You can insert a variety of static objects, such as traffic lights, road barriers, trash containers, construction workers, bulldozers, or pedestrians in wheelchairs, and observe how they influence the overall traffic scene.
"Add a bulldozer 5m behind the black car crossing the street"
β The red classic car stops in front of bulldozer to avoid collision






Other Modifications
Other than these examples, you can insert new vehicles with designated behaviors, add following vehicles, replace existing ones, or remove specific objects. SIMSplat also allows control over asset size, rotation, offset, and other parameters, enabling seamless insertion and iterative adjustment through multi-turn interactions.
Pipeline
Our framework consists of four main stages. First, we train a scene-graph-based 4D Gaussian Splatting (4DGS) model to reconstruct the scene. Next, we perform Language-Gaussian Alignment by embedding appearance, motion, and location features into the Gaussians, enabling direct open-vocabulary querying of road objects. With this language-augmented scene, the LLM agent interprets user prompts to edit the environment, such as adding, removing, or modifying objects. The edits are then refined through a multi-agent path refinement module, which verifies the edited object and adjusts surrounding agents so that they respond naturally to the changes. Scene editing can be further controlled through multi-turn conversations after visualizing the rendered results. Finally, a diffusion-based inpainting model polishes the modified regions to ensure that the rendered outputs appear seamless and realistic.
