For example, if the output/input are floating point numbers than you can assume the domain/range is [-M,M]. Otherwise, with even the most clever function you have no guarantee of ever approaching the optimum, even if the function is continuous. Now even with a limited range there are no guarantees if the function is not well behaved -- so you have to again assume the function is well behaved. And for any assumption you make there is a condition on function for which it is terrible. There is no best assumption, or best algorithm, then. You could, for instance, assume the function is adversarial (trying to make your life difficult), for which the best algorithm is perhaps just sampling randomly the range, which is really a terrible algorithm -- but that's of course just another assumption, and a terrible one.
I would much prefer 'Typical function optimization', if you're optimizing unlabeled functions so frequently, or at least not try to hide the inevitable assumptions.
TL;DR: The contest may be useful, but the concept of "Black box optimization" is nonsense.
Making assumptions and testing them is very much part of the contest. You are even allowed to do this interactively.
There exist many more techniques than trivially assuming some "template" function and fitting the function parameters against the data.
Have a look at nonparametric modelling techniques. For example kernel regression or gaussian processes. You either don't make any assumptions, or you take an uninformative prior that distributes over all possible results.
This competition evokes modelling, optimisation and the exploration/exploitation tradeoff. I'm sure there will be very interesting theory behind the winning entries...
Mathematics models data, and you can't model without assumptions. It's like developing a theory which can't have axioms. For example, kernel regression probabilistic model is a terrible model (assumption) with very large error for a large class of distributions[2], and so on. We're talking about picking the best technique; this technique is going to pick some assumptions arbitrarily that will or will not work well based on an unclear choice of the organizers. That's why I would prefer if they stated instead "Functions with some real world relevance", or "Typical functions", or maybe "Poorly behaved functions", and so on.
[1] http://en.wikipedia.org/wiki/No_free_lunch_in_search_and_opt...
[2] On the wikipedia page you can see they do make assumptions on f to minimize the squared error for choosing the kernel. It's inevitable.
Taken from here [1]:
White-box models: This is the case when a model is perfectly known; it has been possible to construct it entirely from prior knowledge and physical insight.
Grey-box models: This is the case when some physical insight is available, but several parameters remain to be determined from observed data. It is useful to consider two subcases.
1. Physical modeling: A model structure can be built on physical grounds, which has a certain number of parameters to be estimated from data. This could, for example, be a state-space model of given order and structure.
2. Semiphysical modeling. Physical insight is used to suggest certain nonlinear combinations of measured data signal. These new signals are then subjected to model structures of black-box character.
Black-box models: No physical insight is available or used, but the chosen model structure belongs to families that are known to have good flexibility and have been 'successful in the past'.
[1] http://www.sciencedirect.com/science/article/pii/00051098950...
If your function really is some a random oracle, then, indeed, no optimizer will do well against it. OTOH, none will do (relatively) poorly either.
Effective optimization techniques can explore a function generally and exploit similarities to known models or at least any smoothness they can find. Ineffective techniques will just it caught in local minima or fail to exploit smoothness or "obvious" structure.
Powerful "generic" optimizers are a tool which is important for industry. But the common ways they are benchmarked potentially allows for overfitting in the design phase, this contest is intended to correct that, and provide a potentially better assessment of how general these optimizers are.
what you are describing is related to the "no free lunch theorem", something one can attempt to deal with to get things working "in practice"
I predict the winner will use some a mixture of Gaussian processes with various kernels and stochastic control (with a limited look ahead, otherwise it blows up) to pick the test points.
The usual winner is a flavor of CMA-ES, though they may have picked up the functions to avoid this.
Experiences differ, but in mine the most common place to find objectives with gradients is in optimizer challenges.
That said; sure, there should be a track that gives you the gradients. I agree that it would be nice if there were another track.
If I may, I propose that the organizers remove the restriction on disassembling the client library or intercepting network connections. This restriction seems like it cannot benefit the organizers, unless the protocol is insecure. People are going to ignore this rule anyway, and you can't stop them or even detect them doing it. So why put it in there? It's only going to generate ill will.
Each track is split up into 1000 tests; you can run the separate tests concurrently; which more or less eliminates the latency as the largest test permits 'only' about 400k queries.
Unfortunately I have to agree with obstinate here. The pure math is too much for me and reverse engineering (still daunting, but interesting/possible) is not acceptable. If any HN person wins this contest, I offer beers close to the black box :)
(Uh, no doubt I won't do well, since I had no time to ... like.. actually test my code on any functions except a couple trivial trials. :P ... I hope they put up some kind of ranking information as soon as it closes. I have no idea if my results are awful or merely bad :) (and I probably shouldn't share best numbers before it closes) )
2. You have a fixed number of probes M
2. Among M, You have N number probes to get the silhouette of the function (exploration).
3. Then from the rest of the (M - N) trials, you need to find the optima (exploiation).
Sounds more like a pseudo-science than a math problem to me.
The point of the task is to reward methods that work efficiently with limited trials and domain information, rather than who can run hillclimbing on the biggest computer or hand tune the parameters the best.