Human biology generates signal across many measurement types at once: molecular, physiological, behavioral, environmental, and demographic. These modalities are not independent; they reflect overlapping and interacting aspects of the same underlying biology. A model that sees only one modality is working with a partial view. A model that integrates multiple modalities has the opportunity to represent health state more completely.
That opportunity comes with demands. Each modality has its own noise characteristics, collection biases, and missing data patterns. Integrating them naively can amplify biases or create spurious associations that look compelling in one cohort and fail in another. Multimodal foundation models for health need architecture that handles missing modalities gracefully, validation that confirms integration adds information rather than noise, and documentation that is specific about which modalities were present in training and which populations they came from.
Foundation models as shared infrastructure
A foundation model in health is most valuable not as a standalone product but as a shared starting point that other researchers can fine-tune, adapt, and build on. That ambition makes the base model's quality, documentation, and limitations especially important. A poorly characterized foundation model propagates its problems into every downstream application.
For Cytognosis, multimodal integration is central to the health-state coordinate system. We are working toward models that can represent the same biological state across different measurement contexts, handling situations where some modalities are missing or lower quality without degrading silently. Progress on this is gradual and each step will be documented in our open notebook.
Open notebook
Our multimodal architecture and integration approaches are under active development. This page is part of our open notebook and will be updated as methods mature and validation results accumulate.