I understand that the information is different or more complex. But so long as this information follows some predefined rules then it lends itself to optimization algorithms such as root finding and gradient descent.
For example in an MLP, a common activation function that is used is a sigmoid function whose values falls in the range of [0, 1] and the function is differentiable and therefore you can run backpropagation to train the MLP.
Similarly in a human-MLP, the activation function is a human who is capable of understanding (input) and producing (output) multiple kinds of complex information types. Now let's say the humans produce a payload/output and the algorithm passes on this information onto the humans in the next layer based on the weights between the humans.
To give you a real example of how it can be used, let's used a modified version of the game of charades. Imagine there's a 2x2 layered human-MLP i.e. 2 humans in layer 1 and 2 humans in layer 2. There are no teams and there's one moderator who whispers the word to draw to the humans in the first layer. It is the job of the humans in the second layer to guess correctly.
* The humans in the first layer each draw the image of the word with their own artistic style.
* The algorithm decides depending on the weights and some randomness which image goes to which human in layer 2.
* The humans in the second layer receive this images anonymously and they make a guess.
* The algorithm decides what the final output is after weighting the outputs from the humans in layer 2.
- How does backpropagation take place here? I don't know. To begin with we can use Word2Vec to determine magnitude of error in the guess vs the right answer.
- How do you find the derivative of the human's activation wrt to the weights? I don't know. But I'm sure you can come up with interesting new strategies for it.