Leveraging multimodal generative AI for enhancing digital well-being and addressing the economy of attention in public health
Abstract
Amid rising concerns over digital well-being and the socio-cognitive impact of the attention economy, exploring innovative technologies for healthier human-digital interactions is critical. The advancement of multimodal generative AI offers a transformative opportunity to reshape public health strategies. By developing AI systems capable of integrating visual, linguistic, and structured data, researchers can create personalized interventions, adaptive content moderation, and context-aware feedback to mitigate digital overstimulation and information overload. However, current methods often suffer from shallow cross-modal integration, modality-specific encoders, and poor semantic and topological alignment, limiting their generalizability. To address this, we propose ManiGen, a novel geometry-aware multimodal generative framework that unifies manifold-aligned latent spaces with a prompt-driven synthesis strategy. ManiGen’s non-Euclidean generative backbone respects the geometric nature of modality data, enabling missing-modality reconstruction and cross-domain translation. Its GeoPrompt-driven synthesis generalizes prompt learning to semantic anchors in Riemannian latent spaces, supporting adaptive generation with sparse inputs and intentional control. Extensive evaluations across public health datasets demonstrate ManiGen’s superior performance in semantic alignment, multi-input inference, and personalized, low-supervision generation, offering a principled path to integrate generative AI into digital well-being initiatives.