var makeKeyCodepoint = function(word) {
var len = word.length;
if ( len > 255 ) { return undefined; }
var i = len >> 2;
return String.fromCharCode(
(word.charCodeAt( 0) & 0x03) << 14 |
(word.charCodeAt( i) & 0x03) << 12 |
(word.charCodeAt( i+i) & 0x03) << 10 |
(word.charCodeAt(i+i+i) & 0x03) << 8 |
len
);
};
The other a method: var MakeKeyCodepoint = function() {};
MakeKeyCodepoint.prototype.makeKey = function(word) {
var len = word.length;
if ( len > 255 ) { return undefined; }
var i = len >> 2;
return String.fromCharCode(
(word.charCodeAt( 0) & 0x03) << 14 |
(word.charCodeAt( i) & 0x03) << 12 |
(word.charCodeAt( i+i) & 0x03) << 10 |
(word.charCodeAt(i+i+i) & 0x03) << 8 |
len
);
};
var makeKeyCodepointObj = new MakeKeyCodepoint();
Now why the standalone function runs at over 6.3M op/sec, while the method runs at 710M op/sec (on my computer)?Let's consider the receiver (i.e the `this` value) of Example 1 and 2. The receiver of Example 1 is Benchmark, if invoked normally. The receiver of Example 2 is the empty function object function(){}.
When you call makeKeyCodepointObj.makeKey() - the VM looks up the object's prototype chain and finds the function. This call site is cached (think of it as a K:V store, where the key is "makeKeyCodepointObj.makeKey" and the value is the call site of the function.)
When you call makeKeyCodepoint(), the VM has to, for each call, look up the prototype chain until it finds the variable. The variable is then resolved into the function call site. Because of scoping issues in JS, I don't think this is cached (or if it's cached, it'd be invalidated a lot), and a lookup has to happen every time. (I know in my JS engine, I tried to perform caching optimization for global object properties and I gave up).
TL;DR: Function lookups happen all the time when the function is a method of the global object. When a function is a method of an object, the lookup is cached.
If I am talking out of my arse, please feel free to correct me.
(Benchmark.uid1400600789397runScript || function() {})();
Benchmark.uid1400600789397createFunction = function(window, t14006007893970) {
var global = window,
clearTimeout = global.clearTimeout,
setTimeout = global.setTimeout;
var r14006007893970, s14006007893970, m14006007893970 = this,
f14006007893970 = m14006007893970.fn,
i14006007893970 = m14006007893970.count,
n14006007893970 = t14006007893970.ns;
// Test Setup
var makeKeyCodepoint = function(word) {
var len = word.length;
if (len > 255) {
return undefined;
}
var i = len >> 2;
return String.fromCharCode(
(word.charCodeAt( 0) & 0x03) << 14 |
(word.charCodeAt( i) & 0x03) << 12 |
(word.charCodeAt( i+i) & 0x03) << 10 |
(word.charCodeAt(i+i+i) & 0x03) << 8 |
len
);
};
s14006007893970 = n14006007893970.now();
while (i14006007893970--) {
// Test Code
var key;
key = makeKeyCodepoint('www.wired.com');
key = makeKeyCodepoint('www.youtube.com');
key = makeKeyCodepoint('scorecardresearch.com');
key = makeKeyCodepoint('www.google-analytics.com');
}
r14006007893970 = (n14006007893970.now() - s14006007893970) / 1e3;
return {
elapsed: r14006007893970,
uid: "uid14006007893970"
}
}
The test setup and the test itself are all part of the same function, and makeKeyCodepoint is a local variable in that function.So now I took the time to try to go around this by rearranging the calls, and all of a sudden results make more sense:
http://jsperf.com/makekey-concat-vs-join/10
Results:
1. Firefox 29 makeKeyConcat / makeKeyConcatObj = ~440 Mops/s
2. Firefox 29 makeKeyCodepoint / makeKeyCodepointObj = ~64 Mops/s
3. Chrome 34 makeKeyCodepoint / makeKeyCodepointObj = ~5.7 Mops/s
4. Chrome 34 makeKeyConcat / makeKeyConcatObj = = ~2.2 Mops/s
codepoint : 1,172,182,887 ops/sec
codepoint obj: 1,168,116,461 ops/sec
No significant difference. function: 5,572,574
method: 903,375,064
Firefox 29: function: 1,747,475,085
method: 1,727,244,041One thing I found out is that dropping `String.fromCharCode` in favor of a local function (`var fromCharCode = String.fromCharCode.bind(String);`) causes neither of them to be optimizable. See http://jsperf.com/makekey-concat-vs-join/9
concat | concat obj | codepoint | codepoint obj
----------+------------+-----------+--------------
2,243,214 | 1,983,801 | 1,823,882 | 1,746,316
On Fedora 20 x86_66 with epiphany-3.10.3-1 (webkitgtk3-2.2.7-1): concat | concat obj | codepoint | codepoint obj
----------+------------+-----------+--------------
2,515,750 | 2,280,291 | 2,187,448 | 1,957,199I ran it past the one in my browser (current Firefox, Linux) and didn't see a significant difference.
Register allocation is one of those areas where I think compilers are pretty horrible compared to a good or even mid-level Asm programmer, and I've never understood why graph colouring is often the only way that is taught because it's clearly not the way that an Asm programmer does it, and is also completely unintuitive to me. It seems to assume that variables are allocated in a fixed fashion and a load-store architecture, which is overly restrictive for real architectures like x86. There's also no interaction between RA and instruction selection, despite them both influencing each other, whereas a human programmer will essentially combine those two steps together. The bulk of research appears to be stuck on "how do we improve graph colouring", when IMHO a completely new, more intuitive approach would make more sense. At least it would make odd behaviour like this one less of a problem, I think.
Problems are classified as NP-complete based on the Turing machine model. Since the human brain is not a Turing machine, it may be better suited for solving NP-complete problems than a computer. Solutions to NP-complete problems commonly employ heuristics to strike a balance between completeness and efficiency. The human brain seems (at least to me) to be better at solving problems involving heuristics. Chess is an obvious example.
I think the human brain is probably Turing-equivalent, but it also likely doesn't matter -- if I can describe the algorithm that I, as a human, take to perform compiler-beating register allocation and instruction selection, then a machine can probably do it just as well if not faster than me since it can consider many more alternatives and at a much faster rate.
I agree that heuristic approaches are the way to go, but in a "too far abstracted" model like graph colouring, some heuristics just can't be easily used; e.g. the technique of "clearing the table" --- setting up all the registers prior to a tight loop, so that the instructions within do not have to access memory at all. Using push/pop (x86) for "very ephemerally-spilled" values is another one.
That said, of course it is better to try to solve the real problem and not only the decomposition. I think that the focus on cheap compiler passes is a bit obsolete now. A cheap debug-build with some optimizations is good, but for a real production build where speed is needed I am willing to wait quite some extra time to get the last few drops of performance.
I've seen some researchers looking into solving the combined problem which feels promising. That kind of interest combined with the nice modularity of LLVM makes me quite optimistic that there will be nice results that are relevant both academically and industrially.
The flat memory model on i386 at least made it somewhat easier compared to the segment address mode coming with 8086.
Also, a lot of our customers apparently use XP 32bit and have no intention of updating anytime soon. Sigh...
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/per...
edit same goes for Jon Skeet of course, and looking for info about him I came across this http://meta.stackexchange.com/questions/9134/jon-skeet-facts... which has some hilarious ones like
Jon Skeet's SO reputation is only as modest as it is because of integer overflow (SQL Server does not have a datatype large enough) and When Jon Skeet points to null, null quakes in fear.
edit: I don't know byte code or machine code well so my description of what happens unwinding the stack is probably wrong, but my point is just that it's simpler for the CPU not having the possibility of unwinding the stack beyond the code section OP called out.
It's like if you cried out "dear God, why?" about your troubles but actually got a response.