Abstract. A key challenge for biological oceanography is relating the physiologicalmechanisms controlling phytoplankton growth to the spatial distribution ofthose phytoplankton. Physiological mechanisms are often isolated by varyingone driver of growth, such as nutrient or light, in a controlled laboratorysetting producing what we call “intrinsic relationships”. We contrastthese with the “apparent relationships” which emerge in the environment inclimatological data. Although previous studies have found machine learning(ML) can find apparent relationships, there has yet to be a systematic studyexamining when and why these apparent relationships diverge from theunderlying intrinsic relationships found in the lab and how and why this may depend on the method applied. Here we conduct a proof-of-concept studywith three scenarios in which biomass is by construction a function oftime-averaged phytoplankton growth rate. In the first scenario, the inputsand outputs of the intrinsic and apparent relationships vary over thesame monthly timescales. In the second, the intrinsic relationships relateaverages of drivers that vary on hourly timescales to biomass, but theapparent relationships are sought between monthly averages of these inputsand monthly-averaged output. In the third scenario we apply ML to the outputof an actual Earth system model (ESM). Our results demonstrated that whenintrinsic and apparent relationships operate on the same spatial andtemporal timescale, neural network ensembles (NNEs) were able to extract theintrinsic relationships when only provided information about the apparentrelationships, while colimitation and its inability to extrapolate resulted in random forests (RFs) diverging from the true response. Whenintrinsic and apparent relationships operated on different timescales (aslittle separation as hourly versus daily), NNEs fed with apparentrelationships in time-averaged data produced responses with the right shapebut underestimated the biomass. This was because when the intrinsicrelationship was nonlinear, the response to a time-averaged input differedsystematically from the time-averaged response. Although the limitationsfound by NNEs were overestimated, they were able to produce more realisticshapes of the actual relationships compared to multiple linear regression.Additionally, NNEs were able to model the interactions between predictorsand their effects on biomass, allowing for a qualitative assessment of thecolimitation patterns and the nutrient causing the most limitation. Futureresearch may be able to use this type of analysis for observational datasetsand other ESMs to identify apparent relationships between biogeochemicalvariables (rather than spatiotemporal distributions only) and identifyinteractions and colimitations without having to perform (or at leastperforming fewer) growth experiments in a lab. From our study, it appearsthat ML can extract useful information from ESM output and could likely doso for observational datasets as well.