Disabling GC just kills the advanced GC but leaves the basic reference counting approach to freeing memory, so Composer can keep trucking without using much more memory as the GC wasn't really collecting anything. The memory reduction many people report is rather due to some other improvements we have made yesterday.
As to why the problem went unnoticed for so long, it seems that the GC is not able to be observed by profilers, so whenever we looked at profiles to improve things we obviously did not spot the issue. In most cases though this isn't an issue and I would NOT recommend everyone disables GC on their project :) GC is very useful in many cases especially long running workers, but the Composer solver falls out of the use cases it's made for.
That sounds like a bug in the profiler, not with Composer. Observing internal time is pretty important for any profiler.
There are two new commercial condenters though that came out in the last few months: Blackfire.io and QafooLabs.com
Both have announced support for showing GC time in profiles as a result of today's noise :) https://twitter.com/beberlei/status/539816149303955456 https://twitter.com/symfony_en/status/539815082881187841
Worse. GC gets triggered just by assigning references, allocation isn't even needed.
We noticed the issue happened most often when deserializing objects (loading them from Redis to memory). As it turns out, Python would schedule a collection every time the object_created counter was sufficiently higher than object_destroyed counter. In general, this makes sense, because that way you can be sure that objects are being created and not being freed, which most likely means a resource leak or a reference cycle. However, the same thing happens during deserialization - many new objects are created, and none are freed. Coupled with Python's low threshold (700), GC was triggered many many times in every serialization loop (usually in vain, as no new objects became recyclable). Disabling GC and running full collections manually solved the problem
What is this guy doing that he needs gigabytes of memory to install a bunch of php libraries?
Before: Memory usage: 2194.78MB (peak: 3077.39MB), time: 1324.69s
After: Memory usage: 4542.54MB (peak: 4856.12MB), time: 232.66sThat's the reason for the huge memory usage. We're slowly moving away from PEAR, but since it works for now not everyone has/will transition.
Edit: I should also point out that there are a few packages that almost everyone uses (PHPMD, PHPCS, phpUnit) that are still mostly pulled from PEAR, though I think phpUnit has a composer option.
The reality is even in the above edge case: 2x memory, 6x speed.
Any more details on this?
If none can (and in the case of Composer all the objects exist for a reason) then it's wasting time analysing the objects.
So in this case there's only a large waste of cpu doing nothing with gc enabled.
(I didn't know animated gifs in github comments are a thing. Maybe I work too much with boring projects.)
In my opinion, the php cycle collector is a pointless waste of time. In objective-c, apple just let's the memory leak by default, and they give you tools to find the leaks, and then you modify the code to break the cycles.
There is no need to turn cycle collection back on at the end of the program, because OS frees the memory at program termination.
But for long running script, it's either cycle collector, or add support for weak reference. But IMO, due to how reference are stored in PHP, and to my limited knowledge of PHP core, I am quite sure cycle collector are more beneficial in both developer time and usefulness. (Not every programmers know how to manage reference cycle)
The cycle collector is relatively recent, I expect it's not very performant (since most PHP applications don't need it) and composer's dependency resolution may be hitting a pathological case (create lots of objects without cycles, triggering lots of collections but no actually useful work)
> and why they didn't turn it back on at the end of the function?
Since it's a package manager, I'd guess the expectation is the process will die soon-ish afterwards (once it's installed whatever it's resolved). There's a discussion of re-enabling it after dependency resolution (so postinstall hooks run with GC enabled) though.
Note that it's only reducing memory usage if there are cycles. The rest will be collected when the refcount falls to 0.
And it's been like this for 2 or 3 years now. I've seen comment spam of images for commits and issues for quite a while.
> Behold, found something in the docs about garbage collection:
>> Therefore, it is probably wise to call gc_collect_cycles() just before you call gc_disable() to free up the memory that could be lost through possible roots that are already recorded in the root buffer. [...]
https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/commi...
http://php.net/manual/en/features.gc.php
Here's when I found:
PHP uses ref-counting for most garbage collection. That means non-cyclic data structures are collected eagerly, as soon as the last reference to an object is removed.
Naïve ref-counting can't collect cyclic data structures, though. Normally, cycles are "collected" in PHP by just waiting until the request is done and ditching everything. That works great for web sites, but makes less sense for a command line app like Composer.
To better reclaim memory, PHP now has a cycle collector. Whenever a ref-count is decremented but not zero, that means a new island of detached cyclic objects could have been created. When this happens, it adds that object to an array of possible cyclic roots.
When that array gets full (10,000 elements), the cycle collector is triggered. This walks the array and tries to collect any cyclic objects. They reference this paper[1] for their algorithm for doing this, but what they describe just sounds like a regular simple synchronous cycle collector to me.
The basic process is pretty simple. Starting at an object that could be the beginning of some cyclic graph, speculatively decrement the ref-count of everything it refers to. If any of them go to zero, recursively do that to everything they refer to and so on. When that's done, if you end up with any objects that are at zero references, they can be collected. For everything left, undo the speculative decrements.
If you have a large live object graph, this process can be super slow: you have to traverse the entire object graph. If there are few dead objects, you burn a bunch of time doing this and don't get anything back.
Meanwhile, you're busy adding and removing references to live objects, so that potential root array is constantly filling up, re-triggering the same ineffective collection over and over again. Note that this happens even when you aren't allocating: just assigning references is enough to fill the array.
To me, this is the real problem compared to other languages. You shouldn't thrash your GC if you aren't allocating anything!
Disabling the GC (which only disables the cycle collector, not the regular delete-on-zero-refs) avoids that. However, it has a side effect. Once the potential root array is full, any new potential roots get discarded. That means even if you re-enable the cycle collector later, those cyclic objects may never be collected. Probably not a problem for Composer since its a command-line app that exits when done, but not a good idea for a long-running app.
There are other things PHP could do here:
1. Don't use ref-counting. Use a normal tracing GC. Then you only kick off GC based on allocation pressure, not just by mutating memory. Obviously, this would be a big change!
2. Consider prioritizing and incrementally processing the root array. If it kept track of how often the same object reappeared in the root array each GC, it can get a sense of "hey, we're probably not going to collect this". Sort the array by priority so that potentially cyclic objects that have been live in the past are at one end. Then don't process the whole array: just process for a while and stop.
[1]: http://media.junglecode.net/media/Bacon01Concurrent.pdf
Particularly this. What a disturbing documentary.
I know we're serious here, but stuff like this reminds me why I love the internet so much. It's fun to cut loose once in a while.
Not to say there's anything wrong with funny GIF's, but I come to HN exactly because it moderates away that sort of stuff.
I really felt bad seeing how we missed that improvement for so long, so turning it all in a gif-fest is more productive than self-deprecation!
Stay classy programmers.
The commits are embarrassing, stupid and really exposes why developers are considered idiots. Why troll?
Because they are jerks. Period. Grow up noobs.