Either way, I have some thinking to do on if I should "keepa" it or not (sorry really bad joke). Maybe I should purposely turn a blind eye and just trust they aren't going to do anything evil nor have some privacy risk due to how useful it is.
The article says: "[The extension] will collect information about the products you look at and the ones you search for".
Yet, two sentences later it says "The company behind the extension fails to comply with its legal obligations. The privacy policy is misleading in claiming that no personal data is being collected."
So which personal information is exactly included in the data submitted to their servers about the products? Because in that json example I don't see anything that would be even close to personal information.
The remote scraping/execution abilities are not great, I'll give it that. But the rest of it seems like overblown conclusion and interpretation of how it works.
The history of all Amazon products you looked at or searched for is personal data, and it can tell a lot about you. Whether it is also personal data in the legal sense is not something I can say for sure. But it definitely has to be properly covered in the privacy policy, for GDPR compliance at the very least.
But GDPR doesn’t merely require you to disclosure collection of PII, but rather all data collected. There is a good reason for that.
> Allow the add-on to gather Amazon prices to improve our price data
I thought it was common knowledge that Keepa uses the addon to gather prices. Though with GDPR it probably needs to be more explicitly said.
What is strange that people asked for e.g. Amazon.nl support. This isn't implemented as Keepa relies on Amazon (this is their answer in the forums). But if they scrape, why do they still need Amazon?
Nice, I didn’t find this setting and I explicitly went looking for it. So the settings in the “price history” graph don’t merely apply to the way this graph is shown. Now I need to figure out what this setting is doing. Because I didn’t see any conditions in the code which were tied to this setting.
Scraping data from pages you visit shouldn’t be affected by this.
As someone who has done a lot of web scraping and had to route around a lot of blocking (we have business contracts to allow scraping, but they don't stop over-eager sysadmins), this feels like a dream come true.
But I'd never actually want to use this for scraping and I'm not sure any informed user would agree to use this.
We sometimes do this with feeds, but feeds aren't great for stock change latency which is important for us (thin stock levels on wide range of products, more out of stock issues). Scraping ensures we at least have the same stock latency that direct customers see, and we manage the risk on how frequently to scrape.
Most of these companies don't have in-house engineering capabilities. For Shopify based merchants we don't have to scrape, we use the Shopify API, but otherwise scraping is the only real solution.
I don’t. The author even goes out of their way to point out that these requests aren’t generated by the user and so there’s no latent interest information there. I agree that they should cover this behavior in the privacy policy explicitly, but there’s a tone of moral outrage in this piece that seems unearned.
I’m really unsure how you would come to this conclusion. Even if you only read the summary at the beginning or only the conclusions section at the end, you should notice that Keepa is doing both. It will extract data from your Amazon visits (personal information) and do its own scraping (merely wasting your bandwidth if implemented correctly which I am unconvinced of).
> This refers to some pieces of the Keepa functionality but it once again completely omits the data collection outlined here. It’s reassuring to know that they don’t log product identifiers when showing product history, but they don’t need to if on another channel their extension sends far more detailed data to the server. This makes the first sentence, formatted as bold text, a clear lie. Unless of course you don’t consider the information collected here personal. I’m not a lawyer, maybe in the legal sense it isn’t.
When I was reading, I thought that “data collection outlined here” referred to the scraping behavior you reverse engineered, since the pull quote covered the user-generated request. I agree that they should include the additional scraping behavior here for clarity (we’re arguing about it after all). I disagree that it constitutes as a “clear lie”, since I don’t think that data is personal.
I use this extension (and the app) regularly, which activates as soon as I visit Amazon in a container tab. In addition to providing in-depth statistics, features like alerts via Telegram have helped me hunt down bargains. I have noticed the increase in network requests and bandwidth when the tab is active, using basic tracking via Resource Monitor (W10). However, I can easily block it via uMatrix/uBO, if required. In this case, it is a trade-off, which can be justified.
Also, Tracker Control (Android) for Keepa app reports blocking just two trackers Google Crashlytics and Google Firebase Analytics -- so it is not as bad other apps.
I have used CamelCamelCamel in the past, which was more egregious and aggressive in tracking users, but don't know how it fares today.
They moved to the current model of providing an API for Amazon data (which seems to use the extensions users to scrape data).
As well as Honey and Keepa
Exactly. Showing the world the shady business of browser plugins.