I'm not sure -- in the physical space, a lot of the constraints are social, but they emerged in the context of physical constraints. You could, for example, walk into a restaurant and start haranguing the people there, but generally people don't. Without even raising the possibility of the police being called, there's a whole set of inputs the would-be haranguer can see and respond to -- the facial expressions and other body language of disapproval by the others in the restaurant, the sudden change in noise patterns in the room as all the private conversations ceased and everyone shifted their attention to the disruption, etc. Such social signals are lacking or extremely muted in non-physical settings. Is there a way to bring similar social signals to the online world? Maybe some equivalent will emerge as we grow accustomed to being online (though, given how poorly people drive despite cars being a thing for several generations, I'm skeptical of our ability to fully adapt our social systems to some kinds of technology).
Or to take another example of constraints -- if someone in a bar spreads a false rumor, that misinformation can quickly spread to all the patrons in the bar, but it's reach beyond that bar will be slow. By that time more factual information might also be circulating, and the damage of the false information blunted. Online, by contrast, misinformation spreads so much faster than factual information that it is often nearly impossible to counter.
I don't know what the solution exactly is here, but I feel that public spaces need to have more speedbumps. In the same way that people are jerks when they drive and the answer is often "less driving, and slower," I suspect that the answer to bad online social spaces is "less online, and slower," but I'm not sure what that looks like.