We have witnessed great progress in physical-based simulation and neural video generation for animating fluids. However, the two types of methods suffer from different drawbacks. The physical-based simulation methods are built upon manual-designed environments with specific materials, motion trajectories and textures, thus are only capable of animating particular fluids in synthetic scenarios. On the other hand, the neural video generation methods usually encode and warp the entire scene as a whole, which are generally not aware of the complex contents, such as transparency, collision and thin structures that frequently appear in real-world scenarios. In this work, we tackle the problem of real-world fluid animation from a still image.
The key of our system is a Surface-based Layered Representation(SLR) deriving from video decomposition, where the scene is decoupled into a surface liquid layer and an impervious background layer with corresponding transparencies to characterize the composition of the two layers. The animated video can be produced by warping only the surface liquid layer according to the estimation of fluid motions and recombining it with the background.
In addition, we introduce Surface-only Fluid Simulation(SFS), a 2.5D fluid calculation version, as a replacement of motion estimation. Specifically, we leverage the triangular mesh based on a monocular depth estimator to represent the liquid surface layer and simulate the motion in the physics-based framework with the inspiration of the classic theory of the hybrid Lagrangian-Eulerian method, along with a learnable network so as to adapt to complex image textures. We demonstrate the effectiveness of the proposed system through comparison with existing methods in both standard objective metrics and subjective ranking scores. Extensive experiments not only indicate our method's competitive performance for common fluid scenes but also better robustness and reasonability under complex transparent fluid scenarios.
Moreover, as the proposed surface-based layer representation and surface-only fluid simulation naturally disentangle the scene, interactive editing such as adding objects to the river could be easily achieved with realistic results.