CubeSats provide a wealth of high-frequency observations at a meter-scale spatial resolution. However, most current methods of inferring water depth from satellite data consider only a single image. This approach is sensitive to the radiometric quality of the data acquired at that particular instant in time, which could be degraded by various confounding factors, such as sun glint or atmospheric effects. Moreover, using single images in isolation fails to exploit recent improvements in the frequency of satellite image acquisition. This study aims to leverage the dense image time series from the SuperDove constellation via an ensembling framework that helps to improve empirical (regression-based) bathymetry retrieval. Unlike previous studies that only ensembled the original spectral data, we introduce a neural network-based method that instead ensembles the water depths derived from multi-temporal imagery, provided the data are acquired under steady flow conditions. We refer to this new approach as NN-depth ensembling. First, every image is treated individually to derive multitemporal depth estimates. Then, we use another NN regressor to ensemble the temporal water depths. This step serves to automatically weight the contribution of the bathymetric estimates from each time instance to the final bathymetry product. Unlike methods that ensemble spectral data, NN-depth ensembling mitigates against propagation of uncertainties in spectral data (e.g., noise due to sun glint) to the final bathymetric product. The proposed NN-depth ensembling is applied to temporal SuperDove imagery of reaches from the American, Potomac, and Colorado rivers with depths of up to 10 m and evaluated against in situ measurements. The proposed method provided more accurate and robust bathymetry retrieval than single-image analyses and other ensembling approaches.