Speed comes from two things: implementation and algorithm. Algorithmically, the way to learn quickly is to use some sort of stochastic gradient method, i.e. learn from examples one-by-one, as opposed to as a batch.
As far as implementation goes, you need dense arrays. A native Python implementation will usually be lists of Python objects, which is very slow.
If you just need an SVM implementation, libsvm is pretty good. I'm assuming you need a non-linear kernel. If you're using a linear kernel then there's not really a difference between SVM and MaxEnt (well, there is but not much).
If your data is very sparse then there aren't many general-purpose implementations that are any good. The scipy.sparse module has some key stuff implemented in pure Python, and doesn't interoperate properly with the rest of the PyData ecosystem. I had to implement my own sparse data structures, in Cython.