[1] https://github.com/yoavaviram/python-amazon-simple-product-a...
I wrote an app that is basically a new UI for the Amazon products. It runs entirely on the client. The Amazon API simply didn't work in that setup.
Doesnt that require you to have a quota of affiliate sales to keep using it? I can’t find where they state this requirement but I remembered they were very sneaky about disclosing this. If you dont have any affiliate sales after X months, your API key will stop working.
Why do I know this? Because I'm the CTO at Nazdeeq.com where we let users buy Amazon products from countries where they don't ship easily, like Pakistan.
Edit: totally open to partnerships in more countries
+ There's no direct way to buy 90% of products from Amazon since they don't ship to Pakistan
+ Our service is the only in the country that gives a fixed price at checkout in PKR
+ Our customer service is excellent
+ We're one of the cheapest options available, as long as the competition imports products legally.
My protest against such a ridiculous heuristic was to not fix it.
This means that, unfortunately, all the traffic has to go through our own servers.
At a former employer we scraped Amazon many millions of times per day with very simple old tools that rarely needed updating.
I have not scraped a ton of actual individual product pages though so cant testify about scraping that. I do remember it might have been harder.
Don’t really see that as a dealbreaker. So the library will need maintenance. Normal for libraries to need updates in order to keep up with changes. It works today, and it will work whenever it’s updated. Better than nothing and for many use cases that’s good enough.
What's more difficult is product page scraping, because there you have hundreds of different variations. Some from A/B testing and a lot just being specific things that show up for certain product categories (e.g. video games).
We brand it as an ordering API, but we also offer retrieving the product data (item details/pricing.) We put a LOT of engineering resources into data quality and maintenance, as the API is core to our flagship product, PriceYak. If you have questions or want a token, email adam@zinc.io and mention this post.
1. requests.Session() is a class. IDK what request.session() invokes (see https://github.com/tducret/amazon-scraper-python/blob/master...).
2. Isn't one of the points of using Session() that it'll persist stuff like cookies and headers? So why is it re-defining the headers multiple times? (e.g. both GET and POST in the same session have their own respective but identical headers).
3. Is the use of `arg=""` idiomatic? For example in https://github.com/tducret/amazon-scraper-python/blob/master...
4. Using raw list indices without some kind of helper function to catch index and other errors when parsing is not really a good idea in scraping (e.g. `selection[0].text.strip()`.
Also, Interestingly only Alibaba's bots are completely blocked from crawling: https://www.amazon.com/robots.txt
The scraping itself may not be (although I'm pretty sure here in Belgium there is a law against collecting other people's data), but what you do with it may not be legal.
You could make a case for making any kind of profit generated from scraping data illegal. Don't get me wrong, I love scraping things myself.
Also find it amazing there are companies out there like Crawlera that can do serious scraping work and openly flaunt deploying tech to get around whatever scraping blockers are out there.