An elementary topos is a cartesian closed category, with finite limits and a subojbect classifier.
Cartesian closedness means that for objects A and B there is an exponential object A^B of functions from B to A. Cartesian closedness is the right intuition on functions being first class citizens and is at the center of the equivalence CCC-lambda calculus-functional programming. Limits are bread and butter categorical stuff, and the pesky subobject classifier is sort of a pain of what Category Theory understands as classifying things.
In Set, monos into X (inyections into X, subsets of X), determine the characteristic function of the subset, subset of say, U. The characteristic function is U->Bool={True, False}. Summing up, the subobject classifier in Set is Bool and provides the correspondence of functions U->X and X->Bool. In an arbitrary elementary topos, the subobject classifier would be an Ω such that monic arrows ?->X corresponds to arrows X->Ω. I would agree that this has bad digestion, maybe delving in applications one just grow accustomed.
There is an idea of one doing mathematics in an "ambient" set theory, and categorists want to look at that as an ambient category of sets. But then they asked, what are the miminum features I am really using of this ambient category of sets? The list is the requirements of a category to be an elementary topos. So the category of sets is a topos (the topos of sets) very by design. But other categories also do. When one changes Sets to other topos is when weird intepretations emerge. Topos requirements don't let you recover the axiom of choice, for instance. Excluded middle is not available anymore.