Breeding resilient crop varieties demands tools that successfully account for genotype-by-environment (G×E) interactions. Multi-environment genomic prediction methods address this by integrating genomic and environmental data to estimate breeding values of genotypes in specific environments. Previously, we presented an interpretable multi-stream deep learning approach that successfully captured G×E using the apple REFPOP data. In this study, we aimed at testing the strengths and limitations of our modelling strategy alongside classical approaches using a different, larger dataset. For this purpose, we employed the Genomes-to-Fields initiative maize (Zea mays L.) dataset from 2014 to 2022. This dataset encompasses more than 70,000 phenotypic yield and flowering-related measurements in around 225 environments for more than 4,000 maize hybrids. Given the varying complexity and nature of G×E interactions across traits, we employed both grid search and Bayesian optimization to identify optimal hyperparameter (HP) combinations (such as network architecture, batch size and learning rate) tailored for each trait. These optimal HP combinations resulted in overall predictive abilities in the validation set ranging from 0.5 to 0.8 across traits. Furthermore, we present a new data-splitting strategy that enables validation of model performance across different real-life prediction scenarios. Our results demonstrate the suitability and flexibility of deep learning modelling to leverage vast amounts of data when performing multi-environment genomic prediction. We believe that in the long run, deep learning approaches can accelerate the development of cultivars adapted to future environments and promote sustainable agriculture through improved variety selection and more efficient resource use.