Data exfiltration in Keepa Price Tracker (opens in new tab)

(palant.info)

62 pointstaxyovio4y ago35 comments

35 comments

33 comments · 10 top-level

NazakiAid4y ago· 5 in thread

I use Keepa basic and it has saved me a ton of money. I always just assumed it was scraping the prices from pages I visit, but I didn't know it would automatically fetch Amazon pages in the background. Might just sign out of Amazon, and use a separate browser to purchase from it.

Either way, I have some thinking to do on if I should "keepa" it or not (sorry really bad joke). Maybe I should purposely turn a blind eye and just trust they aren't going to do anything evil nor have some privacy risk due to how useful it is.

SCNP4y ago

Isn't this always the trade-off? While I do appreciate useful software, it gets tiring that it's almost always at the expense of a little bit of privacy or tracking. Seems like the death of a thousand cuts of our anonymity online. Although, I don't really harbor illusions that we (at least Americans) haven't been tracked since the invention of the credit card. I guess I'm a little jaded at this point as there doesn't seem to be anything I, personally, can do about it and I get a touch of FOMO when I hear about the capabilities of the latest and greatest apps. I understand that data collection is inherently necessary for AI, I just don't like who's in charge of it and making the innovations.

wheels4y ago

I use the Keepa website and never realized before this article that they even have browser plugins. On the website you can set up price alerts that go out via email or Telegram. That works well enough for me.

NazakiAid4y ago

I would do that but it's very helpful to also see how often the price changes and goes on sale to know if I am getting "ripped off".

wheels4y ago

There's a price graph on their website showing the price development over time.

rafaelm4y ago

Same, I just set up alerts via Telegram and they popup on my phone and desktop client. Didn't know they had a browser extension.

a254613e4y ago· 5 in thread

I can't quite understand this article and its conclusion.

The article says: "[The extension] will collect information about the products you look at and the ones you search for".

Yet, two sentences later it says "The company behind the extension fails to comply with its legal obligations. The privacy policy is misleading in claiming that no personal data is being collected."

So which personal information is exactly included in the data submitted to their servers about the products? Because in that json example I don't see anything that would be even close to personal information.

The remote scraping/execution abilities are not great, I'll give it that. But the rest of it seems like overblown conclusion and interpretation of how it works.

Semaphor4y ago

I’d assume that "products you searched for", even if only implicitly thanks to the results, is personal information. It also is not mentioned in their privacy policy, which only mentions sending on product pages.

1 more reply

palant4y ago

Note: I am the author of the article above.

The history of all Amazon products you looked at or searched for is personal data, and it can tell a lot about you. Whether it is also personal data in the legal sense is not something I can say for sure. But it definitely has to be properly covered in the privacy policy, for GDPR compliance at the very least.

timdorr4y ago

But it is not personal data that would identify you (PII). If someone was able to determine who I was based solely on my browsing activity on Amazon, then they've already obtained my personal information.

iamacyborg4y ago

PII is not a term that is used in the GDPR. The person you're replying to is correct that your browsing data is likely to count as personal data given that it's linked to an individual.

palant4y ago

No, it isn’t PII in the legal sense, it doesn’t allow identifying you directly. Which doesn’t mean that it cannot be tied to your identity. Just one example: if you regularly post to social media what you bought online, this information could be correlated with the Keepa data to find out which profile is likely yours and what else you looked at.

But GDPR doesn’t merely require you to disclosure collection of PII, but rather all data collected. There is a good reason for that.

bkor4y ago· 4 in thread

From the Keepa addon settings:

> Allow the add-on to gather Amazon prices to improve our price data

I thought it was common knowledge that Keepa uses the addon to gather prices. Though with GDPR it probably needs to be more explicitly said.

Semaphor4y ago

There is a difference between gathering prices and loading extra URLs to gather those prices. From that text, I would not assume they are using my computer as a part of a botnet.

bkor4y ago

I knew it was doing that as well (the distributed scraping the article talks about). But I cannot figure out where I read it. Maybe they used to have it somewhere on their site, and now it's gone?

What is strange that people asked for e.g. Amazon.nl support. This isn't implemented as Keepa relies on Amazon (this is their answer in the forums). But if they scrape, why do they still need Amazon?

palant4y ago

Note: I am the author of the article above.

Nice, I didn’t find this setting and I explicitly went looking for it. So the settings in the “price history” graph don’t merely apply to the way this graph is shown. Now I need to figure out what this setting is doing. Because I didn’t see any conditions in the code which were tied to this setting.

palant4y ago

Found it. This is the optOut_crawl setting and its handling is entirely on the server side. So presumably if this setting is set, the server will no longer send the extension any instructions to scrape Amazon pages in background. Mind you, it still could but it probably won’t.

Scraping data from pages you visit shouldn’t be affected by this.

danpalmer4y ago· 3 in thread

Wow, they've built a distributed Amazon listing scraping system – essentially a botnet.

As someone who has done a lot of web scraping and had to route around a lot of blocking (we have business contracts to allow scraping, but they don't stop over-eager sysadmins), this feels like a dream come true.

But I'd never actually want to use this for scraping and I'm not sure any informed user would agree to use this.

voltagex_4y ago

How do you get contracts to allow scraping? What kind of cost are we talking about?

dewey4y ago

Some companies want you to list their products on your page (usually with some kind of affiliate deal attached) but don't have a tech team to implement a feed or an API. In that case you end up in a situation where you have to scrape the data yourself with permission.

danpalmer4y ago

Pretty much exactly this. We're no affiliates as we do our own fulfilment, but essentially we partner with companies to offer their products on our site, and we take a margin from them.

We sometimes do this with feeds, but feeds aren't great for stock change latency which is important for us (thin stock levels on wide range of products, more out of stock issues). Scraping ensures we at least have the same stock latency that direct customers see, and we manage the risk on how frequently to scrape.

Most of these companies don't have in-house engineering capabilities. For Shopify based merchants we don't have to scrape, we use the Shopify API, but otherwise scraping is the only real solution.

wilde4y ago· 3 in thread

> Unless of course you don’t consider the information collected here personal.

I don’t. The author even goes out of their way to point out that these requests aren’t generated by the user and so there’s no latent interest information there. I agree that they should cover this behavior in the privacy policy explicitly, but there’s a tone of moral outrage in this piece that seems unearned.

palant4y ago

Note: I am the author of this article.

I’m really unsure how you would come to this conclusion. Even if you only read the summary at the beginning or only the conclusions section at the end, you should notice that Keepa is doing both. It will extract data from your Amazon visits (personal information) and do its own scraping (merely wasting your bandwidth if implemented correctly which I am unconvinced of).

wilde4y ago

Thanks for engaging here. Maybe my reading comprehension is poor, but here’s the full quote that I was objecting to. It comes after a long pull quote where Keepa promises to not log the requests that do contain latent interest behavior:

> This refers to some pieces of the Keepa functionality but it once again completely omits the data collection outlined here. It’s reassuring to know that they don’t log product identifiers when showing product history, but they don’t need to if on another channel their extension sends far more detailed data to the server. This makes the first sentence, formatted as bold text, a clear lie. Unless of course you don’t consider the information collected here personal. I’m not a lawyer, maybe in the legal sense it isn’t.

When I was reading, I thought that “data collection outlined here” referred to the scraping behavior you reverse engineered, since the pull quote covered the user-generated request. I agree that they should include the additional scraping behavior here for clarity (we’re arguing about it after all). I disagree that it constitutes as a “clear lie”, since I don’t think that data is personal.

1 more reply

45ure4y ago

Thanks for the article.

I use this extension (and the app) regularly, which activates as soon as I visit Amazon in a container tab. In addition to providing in-depth statistics, features like alerts via Telegram have helped me hunt down bargains. I have noticed the increase in network requests and bandwidth when the tab is active, using basic tracking via Resource Monitor (W10). However, I can easily block it via uMatrix/uBO, if required. In this case, it is a trade-off, which can be justified.

Also, Tracker Control (Android) for Keepa app reports blocking just two trackers Google Crashlytics and Google Firebase Analytics -- so it is not as bad other apps.

I have used CamelCamelCamel in the past, which was more egregious and aggressive in tracking users, but don't know how it fares today.

https://camelcamelcamel.com/

1 more reply

mrsaint4y ago· 2 in thread

And not sure if Amazon would agree to this as it essentially threatens the privacy and integrity of their users. Interestingly, Keepa is also an Amazon Affiliate, so they are in a direct business relationship with Amazon.

patd4y ago

As far as I know, Keepa is not an Amazon affiliate. They used to be and got kicked out like many similar tools around 5 years ago.

They moved to the current model of providing an API for Amazon data (which seems to use the extensions users to scrape data).

avipars4y ago

They actively warned about Honey Security Issues, but haven't mentioned Keepa at all.

dzink4y ago· 1 in thread

If the additional Amazon pages are loaded on days when the user hasn’t browsed Amazon, or done once a day, that could be cookie stuffing, explicitly prohibited by Amazon Affiliate terms. The Amazon affiliate cookies last 24 hours, so triggering a session when a user doesn’t do it, might extent their affiliate window and is not right at all.

liquorice4y ago

Keepa is a data company though, not an Amazon Affiliate, so they shouldn't care about violating that policy

robk4y ago

i don't really care - i love the plugin too much to uninstall it. it's saved me a killing.

avipars4y ago

thanks! Uninstalled today!

As well as Honey and Keepa

dna_polymerase4y ago

Do you remember the time when this weird German startup that publishes an Adblocker tried to start an "Acceptable Ads" program and extort money from Google? Guess what their CTO is up to now.

Exactly. Showing the world the shady business of browser plugins.

j / k navigate · click thread line to collapse

35 comments

33 comments · 10 top-level

NazakiAid4y ago· 5 in thread

SCNP4y ago

wheels4y ago

NazakiAid4y ago

I would do that but it's very helpful to also see how often the price changes and goes on sale to know if I am getting "ripped off".

wheels4y ago

There's a price graph on their website showing the price development over time.

rafaelm4y ago

Same, I just set up alerts via Telegram and they popup on my phone and desktop client. Didn't know they had a browser extension.

a254613e4y ago· 5 in thread

I can't quite understand this article and its conclusion.

The article says: "[The extension] will collect information about the products you look at and the ones you search for".

Yet, two sentences later it says "The company behind the extension fails to comply with its legal obligations. The privacy policy is misleading in claiming that no personal data is being collected."

The remote scraping/execution abilities are not great, I'll give it that. But the rest of it seems like overblown conclusion and interpretation of how it works.

Semaphor4y ago

1 more reply

palant4y ago

Note: I am the author of the article above.

timdorr4y ago

iamacyborg4y ago

PII is not a term that is used in the GDPR. The person you're replying to is correct that your browsing data is likely to count as personal data given that it's linked to an individual.

palant4y ago

But GDPR doesn’t merely require you to disclosure collection of PII, but rather all data collected. There is a good reason for that.

bkor4y ago· 4 in thread

From the Keepa addon settings:

> Allow the add-on to gather Amazon prices to improve our price data

I thought it was common knowledge that Keepa uses the addon to gather prices. Though with GDPR it probably needs to be more explicitly said.

Semaphor4y ago

There is a difference between gathering prices and loading extra URLs to gather those prices. From that text, I would not assume they are using my computer as a part of a botnet.

bkor4y ago

I knew it was doing that as well (the distributed scraping the article talks about). But I cannot figure out where I read it. Maybe they used to have it somewhere on their site, and now it's gone?

What is strange that people asked for e.g. Amazon.nl support. This isn't implemented as Keepa relies on Amazon (this is their answer in the forums). But if they scrape, why do they still need Amazon?

palant4y ago

Note: I am the author of the article above.

palant4y ago

Scraping data from pages you visit shouldn’t be affected by this.

danpalmer4y ago· 3 in thread

Wow, they've built a distributed Amazon listing scraping system – essentially a botnet.

But I'd never actually want to use this for scraping and I'm not sure any informed user would agree to use this.

voltagex_4y ago

How do you get contracts to allow scraping? What kind of cost are we talking about?

dewey4y ago

danpalmer4y ago

Pretty much exactly this. We're no affiliates as we do our own fulfilment, but essentially we partner with companies to offer their products on our site, and we take a margin from them.

Most of these companies don't have in-house engineering capabilities. For Shopify based merchants we don't have to scrape, we use the Shopify API, but otherwise scraping is the only real solution.

wilde4y ago· 3 in thread

> Unless of course you don’t consider the information collected here personal.

palant4y ago

Note: I am the author of this article.

wilde4y ago

1 more reply

45ure4y ago

Thanks for the article.

Also, Tracker Control (Android) for Keepa app reports blocking just two trackers Google Crashlytics and Google Firebase Analytics -- so it is not as bad other apps.

I have used CamelCamelCamel in the past, which was more egregious and aggressive in tracking users, but don't know how it fares today.

https://camelcamelcamel.com/

1 more reply

mrsaint4y ago· 2 in thread

patd4y ago

As far as I know, Keepa is not an Amazon affiliate. They used to be and got kicked out like many similar tools around 5 years ago.

They moved to the current model of providing an API for Amazon data (which seems to use the extensions users to scrape data).

avipars4y ago

They actively warned about Honey Security Issues, but haven't mentioned Keepa at all.

dzink4y ago· 1 in thread

liquorice4y ago

Keepa is a data company though, not an Amazon Affiliate, so they shouldn't care about violating that policy

robk4y ago

i don't really care - i love the plugin too much to uninstall it. it's saved me a killing.

avipars4y ago

thanks! Uninstalled today!

As well as Honey and Keepa

dna_polymerase4y ago

Do you remember the time when this weird German startup that publishes an Adblocker tried to start an "Acceptable Ads" program and extort money from Google? Guess what their CTO is up to now.

Exactly. Showing the world the shady business of browser plugins.

j / k navigate · click thread line to collapse