Provided that they figure out a way that absolutely nothing can be done with the information other than to say, "This non-identifiable machine reports that it uses Fedora," I'd be okay with that.
That goes double given Fedora's position as what boils down to RHEL upstream. Too much corporate support for any backlash to make a dent, even if it could get critical mass.
Annoyingly, while installing the Xubuntu flavor, there appeared to be no option to opt out nor was there even mention of any such telemetry in the live installer interface. I had to track down and disable manually post installation - something the average user is not going to bother with and what Canonical is surely betting on. I appreciate how Poettering brings up trust and "red flags", knowing full well the lower the transparency, the larger the reactionary incentive for users to opt out or disable such telemetry. Canonical could perhaps take note.
[1] https://askubuntu.com/questions/1027532/how-to-opt-out-of-sy... [2] https://bugs.launchpad.net/ubuntu/+source/base-files/+bug/17...
The cynic in me is recalling that red hat just got bought by IBM and IBM is in the news for tracking people in a weather app in a sneaky way
I don't know any better though, maybe fedora is quite independent of red hat/IBM and its 100% legit to trust their promises. I'm not sure how it works tbh
Edit: added quote from article
This is a nitpick, but Red Hat did not get bought by IBM. What happened was IBM announcing the intention to buy Red Hat.
It's maybe a subtle, but possibly important distinction. Red Hat is still its own independent entity until the deal goes through (which means IIRC passing the board's approval, SEC and likely other stuff). This is expected to happen in late 2019 I believe, but it might still fall through.
This doesn't absolutely dispel any possibility of IBM's influence, but it should be very low/zero until the merger actually goes through. But I also don't know how all this works.
https://lwn.net/ml/fedora-devel/20190108152239.GA24118@garde...
I must say, it feels odd to support a Poettering proposal, but this actually does look like a good solution.
Is that not also the case with the UUID solution? Generating the UUIDs in virtual machines, or just replacing the UUIDs in the requests, doesn't seem out of the question
I don't doubt it will happen if this becomes well-known. Activism takes many forms.
> This_is_a_first_install
request the first time, then
> get_updates
in future, an so I'm glad to see the proposed solution is something vaguely similar.
Options for "true" values
Rather than a simple boolean, we'd like the "countme" variable to act as an increment-counter. That is, it would be "1" the first week, "2" the second week, "3" the third week, and so on. This will let us sort out short-lived test or CI infrastructure machines and get a better picture of how systems are used over time, without tracking individual systems. Optionally, we could have a cap on the maximum value to mitigate risk of uniqueness for systems which have been running for a very long time (it may be that there are only a few systems running for exactly 327 weeks, for example). As the supported lifetime of a Fedora release is about 30 months, a logical cutoff would be around 60 weeks — the counter could go from "59" to "old".
That way user info never makes it past the mirror (which has their IP anyways) and you don't need anything complex like UUIDs, playing tricks with NTP, or calling home.
This would give a reasonably accurate number. Use bash for measuring linux installs (pretty rare to have linux installed without bash). Then more desktop apps like firefox, eog, and xpdf to measure desktop use. If interested in server side track mongodb, apache, mysql, and similar.
This would also help fedora decide which applications they should pay more attention to.
For example, sending a unique identifier is not the problem. Tracking people through a unique identifier is. So, depending on your goals, you can design a unique identifier system that does not allow tracking (or at least makes the tracking period so small as to be unuseful for purposes other than designed) as outlined in the article through changing the identifier on the client side weekly.
If all you want to do is get a good estimate of how many users use what types of configurations of your software (major and minor version), a UUID that rotates weeks on the client side is perfectly acceptable to use for those statistics to a fair degree of accuracy.
On the other end of the spectrum, people long ago started reducing their trackable footprint online, and the online tracking ecosystem just evolved to finding people through other, trickier methods, such as browser fingerprinting.
> you can design a unique identifier system that does not allow tracking
You can (sortof), but we run against that trust issue again. If I'm giving a unique identifier to someone, I have no way of knowing if their assertions about its use are accurate. Even if they are, there's no guarantee that won't change in the future.
> If all you want to do is get a good estimate of how many users use what types of configurations of your software (major and minor version)
You're talking about the perspective of the publisher. I'm talking about my perspective as a user. A company's "need" to collect metrics is their problem, not mine. If their solution results in more information disclosure than I'm comfortable with (and a unique identifier absolutely is), then I will avoid their software or block communications to their home base.
Taking a strong stand against tracking and, therefore, in favor of privacy is perfectly reasonable for people who use Linux in part due to our hatred of the deep tracking closed-source OSes do.
> Poettering came up with a scheme that alleviated most of the problems that were identified. He proposed that a "countme" flag simply be added to a single mirror-list query each week. The sum of all such queries over a week's time should provide an accurate estimate of the number of Fedora systems. That way, UUIDs need not be stored, which removes much of the concern—data that is not stored cannot be misused.
If Fedora server is compromised they can serve different packages to different users.
However, the packages need to be signed by Fedora for the package manager to accept them, so this has been considered a pretty weak excuse for an "attack" for a while now. "Getting access to code-signing keys allows you to attack the people consuming signed binaries"—wow, you don't say!
> Better metrics overall
> Public stats page updated automatically
> Better knowledge of relative use of different variants
> Insight into Fedora's use in short-lived test systems and temporary containers vs. longer-term installations
but nothing evaluating how and whether the proposed solutions will achieve those things.
With no method being perfect, I'm suprised that no one is calling for a quantitative evaluation of various ID collection schemes, and that there is defined "good enough" value, other than
> We need better data than that.
I'm not a Fedora maintainer, and I'm not maintaining any other software of such popularity, so I have to ask: why? I assume it's to allocate work better. At which point do the downsides outweigh that benefit?
[0] https://fedoraproject.org/wiki/Changes/DNF_Better_Counting
True but we're already in that boat with the way that we gather statistics from mirror hits. I have a hard time seeing how a method like the one proposed would be any more vulnerable to tampering.
EDIT: spelling
In short:
Add a new "countme" variable. This variable will: - Start as a "true" value, - Reset to a "false" value the first time the client successfully makes a request to Fedora mirror servers, and - Be reset to a "true" value after seven days.
This way, rather than filtering by unique IP addresses, we can count only the "true" requests, so we count each machine once — but no more than once.
That seems to be what Poeterring's approach counts.
As far as I know, the desire is to get better numbers on how much the parts of Fedora are being used. There is always more work to do than there are folks to do all of it; having better numbers on how much different bits are being used helps us make better decisions on what to focus on.
Granted, I'm not Matt but I've heard him talk about similar things and have run into the issue myself - "Is anyone even using this? Is it worth putting this level of effort into this particular thing?"
EDIT: Phrasing of the last sentence
As an example, there are very likely to be packages that aren't often needed, but are absolutely critical when they are.
(about 75% serious)
The same problem arises though as you can't track senders - there's no way of knowing how many reports were produced by a single machine.
Yes, I'm sure he did.
[1]: https://en.wikipedia.org/wiki/Universally_unique_identifier
If you want to count users, ask for permission during firstboot. If that's too much to ask, then I'll be in the market for a new OS. Maybe I'll finally go back to my first love: FreeBSD.
No tracking, just simple numeric data for for purpose.