I'm actually suprised/impressed that Chrome has an optimization to detect which closure variables are actually used in a function and garbage-collect the rest.
But it's certainly something important to be aware of. I think the main point is, why on earth would you be constantly re-defining a function "logIt" within a function that is repeatedly called? That's just bad programming -- whenever you define functions within functions in JavaScript, you really need to know exactly what you're doing.
It's just a reduction to demonstrate the problem. I'm sure the code that actually triggered it was much more complex and had valid reasons for doing whatever it was doing.
But what I mean is -- whenever you define a function within another function, that should never be a "default" way of writing JavaScript, because it creates closures, and you're just asking for "memory leaks".
Because closures use memory, you need to make sure there's a valid reason for creating the closure, and above all that if you're creating multiple closures by calling the parent function repeatedly, that there's a really really good reason for it, and that the closures are able to be garbage-collected later on.
In the logical sense each function has its own closure. There is no way functions should affect which[edit for clarity] variables are closed on by each other. So you shouldn't expect this.
>I'm actually suprised/impressed that Chrome has an optimization to detect which closure variables are actually used in a function and garbage-collect the rest.
In the logical sense functions close over their variables, not all variables that happen to be in scope. So that optimization is needed.
var a = function() {
var z = "a";
window.b = function() { z = "b"; }
window.c = function() { return z; }
};
a();
window.c(); // returns "a"
window.b();
window.c(); // returns "b"
As you can clearly see, the single closure containing the variable "z" is shared between functions b() and c().So each function does not have its own closure -- closures work "upwards", referencing everything in every scope above them.
What? The whole point of a closure is that it is shared.
They also store any variables they may refer to that are defined in their enclosing scopes, including the parameters and variables of outer functions.
In particular, if the closure only reads the variable and nothing can write it, it's way cheaper in terms of performance to just create a separate lexical environment for the closure that contains the things it reads so that you don't have to keep walking up the scope chain on bareword lookups in the closure.
The fact that things then end up not referenced and can be gced is just a side-effect of the primary goal of the optimization: faster access to variables in a closure.
The definition of "memory leak" is smeared all across the road.
I ran the code on node and the RSS does appear to be increasing without bound. Even running node with --expose-gc and forcing a global.gc() inside logIt() causes the same unbounded memory growth.
Increasing the size of the array by a factor of 10 causes RSS usage to jump up by a factor of 10 every second, so we know that the memory usage isn't caused by creating a new long-lived closure (i.e., the logIt() function) every second.
In fact, removing the call to doSomethingWithStr() doesn't change the unbounded memory growth.
Here's a shorter snippet that demonstrates the leak more dramatically (about 10 MB/sec RSS growth):
var run = function () {
var str = new Array(10000000).join('*');
var fn = function() {
str;
}
var fn = function() { }
setInterval(fn, 100);
};
setInterval(run, 1000);
Tried it out on node v0.8.18Edit: removing the function definition eliminates the problem, so I'd say that your diagnosis of the problem is spot on.
This is not devastating, it is how closures and scope chain work in JavaScript. See my detailed explanation below. The crux of the matter is that the closure still references the outer function's scope chain, even after the outer function has returned. And the outer functions's scope chain is referenced by the closure until the closure is destroyed or returns.
The other time, I was able to do more data flow analysis, but it also resulted in a bunch of annoying fiddly bugs and took more maintenance.
I'm not suggesting the v8 team took the 'easy way' out, but doing the deep introspection is hard, and in an environment such as a browser, I can see trading a pathological case such as this for what must be 1,000,000,000 inappropriate aggressive gc bugs.
(See the recent discussion over RubyMotion for examples of the opposite problem: http://news.ycombinator.com/item?id=5949072)
But the real killer is that you can't tell which closures will keep what alive unless you assume that every closure keeps everything it closes over (even if it doesn't use it) alive. Nothing in the ES spec requires them not to...
Is it that, only one closure is created for variables defined in the function run, and variables needed in any inner-function are added to run's closed context? That seems like a bug fixable without impacting the language.
But the original bug that led to this discovery involved a data structure that shouldn't have leaked at all. I've updated the post to show it; duplicated here since GitHub Pages seems to cache posts pretty aggressively.
var theThing = null;
var replaceThing = function () {
var originalThing = theThing;
// Define a closure that references originalThing but doesn't ever actually
// get called. But because this closure exists, originalThing will be in the
// lexical environment for all closures defined in replaceThing, instead of
// being optimized out of it. If you remove this function, there is no leak.
var unused = function () {
if (originalThing)
console.log("hi");
};
theThing = {
longStr: new Array(1000000).join('*'),
// While originalThing is theoretically accessible by this function, it
// obviously doesn't use it. But because originalThing is part of the
// lexical environment, someMethod will hold a reference to originalThing,
// and so even though we are replacing theThing with something that has no
// effective way to reference the old value of theThing, the old value
// will never get cleaned up!
someMethod: function () {}
};
// If you add `originalThing = null` here, there is no leak.
};
setInterval(replaceThing, 1000);FWIW that is certainly the twisty bits of garbage collected languages.
Sadly it's not quite that simple because of the behavior of `eval` (strictly speaking, direct `eval`): it can leak bindings into the surrounding scope.
It seems like ChuckMcM is describing is reference counting which variables are needed in the closure. His suggestion would fix this memory issue, but probably have a big performance penalty.
So you should be able to statically determine if this is the case, and it's not here.
In fact, Chrome GC-ing unreferenced variables could break eval if they didn't cover that case.
Here is a technical explanation of why there is a memory leak and how to fix the problem.
The scope chain of Closures (in JavaScript) contains the outer function(s) activation object. The activation object of a function contains all the variables that the function has access to, and it is part of a function’s scope chain.
This means the inner function (the closure) has access (a reference) to the outer function’s scope chain, including the global object. And even after the outer function has returned, the closure still has access to the outer function’s variables.
Therefore, the activation object of the outer function cannot be destroyed (for garbage collection) after it has returned, because the closure still references its variables.
When the outer function returns, its own scope chain for execution (its execution object) is destroyed, but its activation object is still referenced by the closure, so its variables will not be destroyed until the closure is destroyed.
The execution context of a function is associated with the functions’ activation object, but while the execution object is used for its own execution and is destroyed when it returns, its activation object is referenced by closures—its inner functions.
Now, as to the specific example in question: The reason the str variable is never destroyed is because it is referenced buy the logIt function because the logIt function's execution object references the entire scope chain of the run function, and the logIt function is never destroyed, so the str variable remains in memory.
As the original author (OP) suggested, be sure to dereference any local variable in the outer function that the closure is using, once the closure is done with it or once the outer function is done with it.
Also, simply setting the logIt function to null (when it completes execution—returns) will allow the str variable and the entire scope of chain of both the logIt and the containing run function to be destroyed and ready for garbage collection.
For a detailed explanation of closures in JavaScript, see "Understand JavaScript Closures With Ease": http://javascriptissexy.com/understand-javascript-closures-w...
If you read more closely, you'll notice that logIt was not the cause of 'str's retention, but instead, the doSomethingWithStr function was.
When logIt is present by itself, str is not retained. Only when doSomethingWithStr is added alongside logIt is str retained.
First, you are correct that I specifically mentioned the logIf function when I discuss the specific example.But that does not take away from my thorough explanation of the main reason for the problem. In fact, everything I said about the logIt function applies to the doSomethingWithStr function, since they are both closures, so my explanation stands as is.
If you read my explanation again, you will see that I clearly explained that closures still have access to the outer function's scope, so both the logIt and the doSomethingWithStr functions have access to the outer function's scope chain even after the outer function or any of the other closures returns.
It is not until both all closures are destroyed or returns that the outer function's scope activation object is destroyed.
http://blogs.msdn.com/b/ericlippert/archive/2007/06/06/fyi-c...
This is an old article, but I believe it still applies to the latest iteration of the language.
[1] http://www.microsoft.com/en-us/download/details.aspx?id=7029
"the local variable x is captured by the anonymous function, and the lifetime of x is extended at least until the delegate returned from F becomes eligible for garbage collection (which doesn’t happen until the very end of the program)."
I wonder if using the "use strict" directive would let the browser optimize this more easily? Probably not, but Javascript does have a lot of crazy corner cases, and one of them might be preventing (or just making it harder to prove safe) the optimization.
e: Here's an example of one of those corner cases:
console.log = eval
Now whether those functions reference a particular variable depends on the string they're printing! And this swap out of log() could be done at any time.e2: Ah, as glasser and pcwalton point out, an indirect reference to eval doesn't work the same way as a direct call. TIL!
> (function () { var x = 5; eval("console.log(x)"); })()
5
> (function () { var x = 5; var e = eval; e("console.log(x)"); })()
ReferenceError: x is not defined* Safari 5.1 exhibits no significant heap growth, as far as the timeline shows anyway
* Using top to observe private memory, Firefox does seem experience significant heap growth on the provided program
EDIT: I find the down-votes very amusing. Has javascript now developed a cult? Read up on the history of javascript if my message above comes as a shock to you.
eg:
http://clang.llvm.org/docs/Block-ABI-Apple.html
http://download.oracle.com/otndocs/jcp/lambda-0_5_1-edr2-spe...
http://www.microsoft.com/en-us/download/details.aspx?id=7029
etc etc. I also don't believe I'm attacking javascript here, merely repeating what people have always known about evolved languages. Evolved languages have a lot of pros (mainly speed of new features and practical application), and they have cons too. What I'm saying is very simple: this type of bug in javascript should not be a surprise - it should be expected. There will be more.
EDIT: Laughably enough, while the ObjC and Java specifications both specifically address this issue, the C# specification actually formalizes this exact bug into the language specification. So I concede your point, C# - a designed language - simply designed the exact same bug into the language itself. See section 7.15.5.1 Captured outer variables.
Feel free to downvote this post - I now concede that designed languages are just as likely to design stupid bugs into the language as evolved languages are to create them by mistake. ;)
There's a small reproducer example, so there will be a new version issued shortly. Not really something to get excited about in my opinion.
(Yes, I know this behaviour could be actually called ok and allowed, but you can do analysis on that code that will show the variable is not used. Even if not now, it can be fixed at some point in the future.)
((Downvoted as language bashing, rather than because I like JS - I hate it with passion for other reasons. "there are probably a huge number of these kind of issues just waiting to be found" is just a silly claim unless you're actually working on finding them or a true claim for any kind of general purpose software.))
I was commenting more on the practical effects of having an evolved language - this is a known and likely outcome in an evolved language. Designed languages generally spend a lot of time trying to avoid these kind of issues - it makes sense that not spending that time and adding in features as they are thought up means issues like this will creep in somewhere.
If those problems weren't enough to convince certain developers that JavaScript is a bad idea, and unsuitable for use, then it's likely that nothing will. This is especially true for server-side usage of JavaScript, where there are so many far better alternatives.
It's easy to check for dict.someProperty, but dict[propertyName] where propertyName can be the string "someProperty" is a much more complicated problem. Now you have to build a tree of everything that changes propertyName and make sure you know it can never be "someProperty".
[0] outside of the global scope through `window` but all variables are clearly local here so that doesn't apply.
The variable is still being held in memory just with the value null instead of whatever it was before.
Or does setting it to null trigger some special GC algorithm that detects that it isn't being used anymore?
If you run these on your own machine and peek at the RSIZE, it stays constant (well, the first grows slowly due to `logIt`).
http://play.golang.org/p/A5Pz-3kthP http://play.golang.org/p/RnXr_jB5Qh
That is basically how Lua implements closures.
Implementing the simplest thing that is correct according to the spec sounds like a good idea to me.
A world-class JavaScript environment like V8 is far past "the simplest thing that is correct according to the spec".
In fact, the JavaScript spec doesn't actually define anything about garbage collection! You can create a compliant ECMAScript-262 runtime that literally never collects any garbage ever. That's certainly the simplest correct thing. But it's a bad idea!
V8 already goes to the trouble of figuring out whether or not a given variable needs to be stored in the lexical environment or not. Specifically: variables that are not used in ANY closures and where there is no eval in sight can be stored outside of the lexical environment. This is great, and better than many similar programming language environments offer.
It just would be even better if they went one step farther.
I'm not as convinced as you that they haven't found the right balance.
The surprise is that `logIt` holds on to the giant `str` object in its environment, despite the fact that there is literally no way for the `str` variable to ever be used again.
V8 is smart enough to not make `logIt` hold on to `str` if there are no closures at all which refer to `str`. It's the introduction of the unrelated `doSomethingWithStr` closure that forces `str` into the lexical environment.
It was that code to replace a certain type of object (a Spark renderer) with a new instance of that object accidentally ended up with a reference to the preceding renderer in a closure assigned somewhere on it, even though that particular closure didn't actually use that reference.
So instead of replacing renderer #N with renderer #(N+1) and GCing #N, we ended up with a stream of renderers which never could be GCed... even though the reference keeping the old ones alive was literally impossible to ever use.
setInteval(function() { console.log('foo'); }, 1000)
Is not a memory leak.But I agree that it's not surprising at all. If you can access a variable from somewhere it will obviously not be garbage collected.