Without sensory data there cannot be actual cognitive ability, though there may be potential for it. The data doesn't have to be visual; bear in mind we have 5 senses. When vision is impaired, hearing becomes far more sensitive to compensate. And theoretically, if someone were to only have use of a single sense, they may still be able to use the data from it to actualize their cognition, but it would take a lot more effort and there would be large gaps in capability. Just as, technically, preprocessed vision* is the primary "sense" of LLMs.
* Preprocessed since the data is actually of 1D streams of characters, and not 2D colour points (as with vision models).