Personally I'm of the view that the Kullback-Leibler divergence which is defined for arbitrary probability measures (with no special treatment for continuous ones) and which is independent of the choice of coordinates is the true measure of information.
Its downside is that you can only compare 2 distributions that way. For the discrete case you can just pick the uniform distribution as your non-informative base. The issues with the entropy definition for continuous distributions boil down to the problem of picking a uniform distribution for the real numbers.