Essentially there have been phases in automation from integration services, to browser automation, to RPA. The last phase, RPA(Robotic Process Automation from services like UIpath), used computer vision images to target elements to click or scrape. UIpath's innovation was using computer vision. Before that, browsers used code selectors in HTML like CSS selectors and Xpath. All of these solutions have a fatal flaw for automation: when popular services update their designs, you have to go back and re-build the automation and all targets.
We invented "Semantic Targets" in 2022 after trying to solve the end-to-end problem using just GPT-3. Semantic targets all targeting elements using english and reasoning, so you can build future-proof targets that still work when services update their designs. The other cool feature is you can add logical reasoning to these targets now. For example,
"Only scrape the funny tweets" or "Only scrape the tweets with the word Cheat Layer" or "If there is Cheat Layer in any tweet, say only 'yes'"
It took a year+ to build a multimodal model that calculated the probability each element matched the intent, but now modern models like Gemini can do this(GPT-4 can't target precise coordinates).
So if your target is "post button" even if twitter changes the color, moves, or even changes the word "post" the automation can still find it on the screen and click it.
We're pretty sure all automation tools will use this eventually in the future, since it seems like a no-brainer.
Here's more details:https://docs.cheatlayer.com/fundamentals/agentic-process-aut...
I know we've had some victories recently for web scraping like the hiQ case, but I don't remember it it covered bypassing "technical protections" in a similar way that copyright laws are applied.
(Serious question, since I'm used to those sort of recaptcha-solving services only being advertised on far seedier websites, so I'm surprised it highlighted on the homepage. For the record I am against that interpretation of CFAA but personally wouldn't want to be the test case)
This is not a technology anyone developed, but an emergent capability of the models and our system to prove how well it works. Courts will still have to determine intent and harm based on the user of the product.
This is definitely an area we can improve, but we have a novel framework for testing and maintaining robustness. We use an LLM based testing loop to verify steps in the state machine, which is a chat interface that generates the agent from end-to-end then outputs a no-code graph. This testing loop allows soon support uploading loom videos to generate automations without installing things locally.
All agents have an API which directly returns results in a JSON, including the results of this testing loop. Check this image for an example: https://cdn.discordapp.com/attachments/1068385542875664424/1...
We also introduce a new robust future-proof targeting strategy that still works when services changes designs, because semantic targets like "Post button" will still work if the button changes colors or moves across this screen. This is a test that fails all current RPA tools including UIpath, so we have multiple paths to improve on the previous incumbents if we consider all the tools we have today with reasoning models.
BTW, do you open-sourced all of it or just parts. Because I see your github repo does not update for a month. can I run this local?
The chrome extension is required to target off-screen elements, using a websocket server locally. This allows a 100% replacement for previous selector strategies.
If you are interested in contributing email me at rohan@cheatlayer.com and I can help get you set up.