Imaging modalities including US, PA, MRI or CT can be systematically described by mathematical forward models using underlying physical principles. In clinical practice, basic reconstruction methods for 3D/4D inverse models are often limited by robustness, accuracy and efficiency. Recently, in computer vision, deep learning methods based on convolutional neural networks (CNN) reached human performance in many low-level imaging tasks in real-time. However, until now deep learning has not been connected to reconstruction methods and is an uncertain black-box where reliable and stabilizing a-priori knowledge about the underlying physics and regularity of solutions is inherently needed. Here, we combine models and theory of deep learning for lower-level imaging, e.g. CNN-based denoising and classification, with reconstruction models known from inverse problems. This systematic bridge of the unified framework for deep learning reconstruction will serve as a unifying algorithmic framework in our consortium for cross- fertilization regarding multimodality to achieve robust, multimodal, task-driven image reconstruction as the software technology for imaging machinery of the future.