I think the 3rd paragraph is inacceptable.
"Good" or "Evil" are things that can only be seen by Humans. They are neither naturally "there" nor intrinsical.
Same with "value" and "harm". Your Model only works when people feel similar to you about whats "harm" and whats "good", thats why it won't be able to scale up.