* Obese JavaScript code. We had to write our own custom code to log events.
* Aimed at large scale companies. We only have 1000s of users, and we care about each individual exception, but I think it is really aimed at consolidating large numbers of events.
* Meaningless percentages on data. Tagged data is processed, but the end percentage value has little meaning e.g. send through 1000 similar events, with 1 event with a tag with value X, and 1 event with a tag with value Y, and 998 with no value. Sentry reports 50% X and 50% Y!
But they have given us really excellent service, especially given we are not paying enterprise rates.
Edit: also we are not in a US timezone, which makes the UI weird. And I do love the email integration: have a bug, get an email, fix it.
For what it's worth we spent a lot of time to reduce bundle sizes recently. It's ultimately a tradeoff between how rich and complete the data is one wants to capture (and from how many browsers) and how big one wants to have the bundle :(
Additionally there is another version of the JS SKD dubbed "loader" which lazy loads the real SDK on first use: https://docs.sentry.io/platforms/javascript/#lazy-loading-se...
What you're saying here confuses my slightly. Sentry is aimed at consolidating large numbers of equivalent events. It looks at things like the stack trace to determine when to merge events into a single issue. If you deploy a bug that results in one exception occurring a thousand times per second you want sentry to create one issue with thousands of events, not thousands of issues, right?
Source: a happy user.
I regret never having made the time to open-source what I built. The Processor is written in Python, takes reports from mobile devices, unwinds, symbolicates, retraces, unminifies, etc as needed, then generates a Sentry "event" and forwards that to our on-prem Sentry instance.
I also built the SDKs. For iOS, I used PLCrashReporter. These days I'd probably use KSCrash. An important point here. On iOS, the unwinding is done on the device. So all you have to do on the backend is symbolicate it. Another point: it's relatively easy to get iOS system symbols. Plug an iOS device into a Mac running Xcode and the symbols are transferred from the device to the Mac. You can then harvest them however you need. In fact, Apple has apparently stopped encrypting OTA updates so you no longer need an iOS device to get the symbols:
https://github.com/Zuikyo/iOS-System-Symbols
For Android NDK crashes I've tried a few approaches and still don't have a satisfying solution. Originally I went with breakpad + minidumps on the device. On the backend, the Processor runs the breakpad stackwalker on the minidump. Another important point: the unwinding is occurring on the backend in this case, unlike iOS where it's done on the phone. (A minidump is basically just a snapshot of all the thread stack memory, plus some extra diagnostic info.) But to unwind reliably off-device you need the Android system symbols (in addition to the app's symbols obviously). Well good luck with that. Google makes the original Nexus Android OS images available so you can harvest those but you'll never get symbols for all the various Android devices. I built a tool that can harvest symbols off a device and tried to crowdsource them from Yahoo's developers but it's not been very successful (there's a lot of flavors of Android).
Another issue is that minidumps are relatively largish to deal with. So my second approach was two-fold. I'm still using breakpad's crash handler on the device, but I now have it generating the much smaller microdump format. In addition, I've added libunwind to our Android SDK so that after capturing the microdump, I attempt to unwind on the device (also collecting function names during unwinding) and add that info to the report. The Processor then only needs to unwind the microdump if the unwinding on the device failed. Otherwise it just needs to symbolicate. This hasn't been wildly successful though. Unwinding on an Android device is trickier than on an iOS device. Also, it's almost impossible (well I haven't figured out how) to unwind through the ART/Java frames that called into the native code.
Of course the vast majority of Android crashes are in Java code and this is much easier to deal with these. They are unwound just find on the device so on the backend you only need to deal with deobfuscating the ProGuard minification which is easily done using the mapping file generated by ProGuard.
What's really annoying with native mobile crashes is that both Android and iOS have their own services for both capturing crashes and unwinding on the device. And because these are integrated with the OS and work out-of-process, they are much more reliable than anything you can do in-process using something like PLCR, KSCrash, libunwind, etc.
But, neither OS gives an app access to its own system generated reports. All you get is the lame reports the devices upload to Google Play Console / iTunes Connect.
Anyway, thank you to Sentry for providing such a great product and I'm sorry again I wasn't able to contribute more. I'm not sure what I built would work at your scale. It's interesting we ended up with similar designs.
Indeed. But sadly Apple does not provide a symbol server like Microsoft does. We are maintaining our own internally. I wish we could open it up to the world but I'm pretty sure that it's not legal to redistribute these.
> For Android NDK crashes I've tried a few approaches and still don't have a satisfying solution.
That is indeed overall a pretty frustrating situation. It's similar for linux in general where it's really hard to get all the debug symbols collected. And even if debug symbols exist, they are not stored like you would expect from a symbol server. Very frustrating.
I'm quite annoyed that there is so little support from the platform holders to provide production debugging APIs. One would think there is a higher demand for this :(
Ditto. But the maintainer of the repo I linked to has thrown caution to the wind and thrown them all up on a Google Drive:
https://github.com/Zuikyo/iOS-System-Symbols/blob/master/col...
I am going to whine a bit that the recent move over to the unified SDK has been less than ideal for us. The fact that the raven docs would point us to the unified SDK but not to a “how to migrate” page made me super unsure about whether we were doing the right thing (esp. when it came to the logging integrations on Python)
It’s kind of an interesting problem, providing SDKs for each language. Sentry went with unifying the API across language boundaries and I’m not super happy with the results but I don’t have like 30 packages to maintain
Yeah, that move and the docs did not go exactly as planned. There are a few reasons why we did it: a) the old SDKs had no sensible state management which caused endless issues such as incorrect breadcrumb collection in async code. b) it's really hard for customer support to understand the number of SDKs.
We're working on improving that, in particular docs.
e.g. sentry.io/issues/SEN-12345
We're also introduce a much more comprehensive event search which will require event permalinks so we're sorting some of that out.
Feel free to throw additional feedback our way. Best place would be on forum.sentry.io to make sure the team actually sees it.
I handle ops for Mozilla's crash reporting pipeline for Firefox [0] and our symbol server [1], among other things. I know our respective development teams stay in touch, and I hope we can find a way to use symbolic/symbolicator to simplify our stack.
[0] https://socorro.readthedocs.io/en/latest/ [1] https://tecken.readthedocs.io/en/latest/