1
Strategies to download data constantly changing via API
I have to download a dataset through one API (WFS provided by geoserver) that tells me the total amount of items and delivers at maximum 1000 items per request and I can sort by one field and offset the requests start index. The layer has ~1Million items. I can use at maximum 5 parallel request before API gets overloaded.
Problem is that items are being added and removed in real time, so at the end of the copy process I already have stale data copied and there are new items to be copied over. So what would you do, or have done in this situation? Start a never ending loop to crawl data all day long would be something evil or is it something to be fixed on provider side?
The api url is https://geoserver.car.gov.br/geoserver/sicar/wfs
Source data website: https://consultapublica.car.gov.br/publico/imoveis/index