We built an open-source UIPath alternative that solves problem in all RPA (opens in new tab)

(openagent.studio)

28 pointsGPUboy1y ago15 comments

15 comments

15 comments · 6 top-level

akersten1y ago· 3 in thread

Is it a CFAA violation to check a box that says "I'm not a robot" when you are, in fact, a robot?

I know we've had some victories recently for web scraping like the hiQ case, but I don't remember it it covered bypassing "technical protections" in a similar way that copyright laws are applied.

(Serious question, since I'm used to those sort of recaptcha-solving services only being advertised on far seedier websites, so I'm surprised it highlighted on the homepage. For the record I am against that interpretation of CFAA but personally wouldn't want to be the test case)

GPUboyOP1y ago

Good question. It depends on the service terms and intent. Agents will increasingly break these captchas, so exposing that ability will only delay the next time it fails. This capability is already built into Gemini Pro, for example, but it hasn't been advertised heavily yet.

This is not a technology anyone developed, but an emergent capability of the models and our system to prove how well it works. Courts will still have to determine intent and harm based on the user of the product.

blacksmith_tb1y ago

I have been pleasantly surprised by NopeCHA[1] which is free for 100 captchas a day. I often see it valiantly solving several in a row while I am busy filling out a form. Though I do get a slightly more shady vibe off their paid plans, given the sheer scale.

1: https://nopecha.com/

GPUboyOP1y ago

We solved unlimited agents/executions or you can run a local model, so you can use Open Agent Studio to do unlimited captchas

GPUboyOP1y ago· 2 in thread

I'm pretty sure we were the first to invent this concept of "semantic targets" based on our git commit logs. Essentially we 100% replaced previous selector strategies like xpath and CSS selectors in 2022, and we were the first startup approved by openAI to sell GPT-3 for automation august 2021. We've been working on this problem for 3 years and just open sourced some of our work. We're looking for contributors who could help with evals for generalized agents, and to push the unique state machine that appears to be state-of-the-art in forward solving open-ended problems.

prng20211y ago

This sounds amazing! Can you explain how semantic targeting works?

GPUboyOP1y ago

Thanks! Sure:

Essentially there have been phases in automation from integration services, to browser automation, to RPA. The last phase, RPA(Robotic Process Automation from services like UIpath), used computer vision images to target elements to click or scrape. UIpath's innovation was using computer vision. Before that, browsers used code selectors in HTML like CSS selectors and Xpath. All of these solutions have a fatal flaw for automation: when popular services update their designs, you have to go back and re-build the automation and all targets.

We invented "Semantic Targets" in 2022 after trying to solve the end-to-end problem using just GPT-3. Semantic targets all targeting elements using english and reasoning, so you can build future-proof targets that still work when services update their designs. The other cool feature is you can add logical reasoning to these targets now. For example,

"Only scrape the funny tweets" or "Only scrape the tweets with the word Cheat Layer" or "If there is Cheat Layer in any tweet, say only 'yes'"

It took a year+ to build a multimodal model that calculated the probability each element matched the intent, but now modern models like Gemini can do this(GPT-4 can't target precise coordinates).

So if your target is "post button" even if twitter changes the color, moves, or even changes the word "post" the automation can still find it on the screen and click it.

We're pretty sure all automation tools will use this eventually in the future, since it seems like a no-brainer.

Here's more details:https://docs.cheatlayer.com/fundamentals/agentic-process-aut...

cowthulhu1y ago· 1 in thread

How does it compare to uipath when it comes to orchestration? In my experience, a lot of the low code tools don’t make it easy to monitor automation health, validate outputs, and set retry policies… that’s an area where I’d kill to see a tool that improves on the incumbents.

GPUboyOP1y ago

Thanks for the feedback.

This is definitely an area we can improve, but we have a novel framework for testing and maintaining robustness. We use an LLM based testing loop to verify steps in the state machine, which is a chat interface that generates the agent from end-to-end then outputs a no-code graph. This testing loop allows soon support uploading loom videos to generate automations without installing things locally.

All agents have an API which directly returns results in a JSON, including the results of this testing loop. Check this image for an example: https://cdn.discordapp.com/attachments/1068385542875664424/1...

We also introduce a new robust future-proof targeting strategy that still works when services changes designs, because semantic targets like "Post button" will still work if the button changes colors or moves across this screen. This is a test that fails all current RPA tools including UIpath, so we have multiple paths to improve on the previous incumbents if we consider all the tools we have today with reasoning models.

mikewang1y ago· 1 in thread

This is super cool. I have to say this. I love your concepts.would love to see it better.

BTW, do you open-sourced all of it or just parts. Because I see your github repo does not update for a month. can I run this local?

GPUboyOP1y ago

Thanks! The backend and chrome extension is not yet open source, but we have to do some work to make that possible expected in the next week. It performs best with Gemini Pro 1.5, so the backend is important for now, but long term we can switch to a local model as soon as possible.

The chrome extension is required to target off-screen elements, using a websocket server locally. This allows a 100% replacement for previous selector strategies.

If you are interested in contributing email me at rohan@cheatlayer.com and I can help get you set up.

bzmrgonz1y ago· 1 in thread

"open-source uipath alternative" is quite a bold claim amigos!! I hope you're right, because the world need a good open source RPA in nocoloco (no-code/low-code).

GPUboyOP1y ago

We actually work better than RPA tools like UIpath, since all our targets use english rather than computer vision or code selectors. I suspect they will copy this idea in the future. We can show you side-by-side comparisons of automations failing in others or that are impossible

FractalHQ1y ago· 1 in thread

What is RPA

mmerlin1y ago

Robotic Process Automation (adaptable software automation of a previously human-controlled process)

j / k navigate · click thread line to collapse

15 comments

15 comments · 6 top-level

akersten1y ago· 3 in thread

Is it a CFAA violation to check a box that says "I'm not a robot" when you are, in fact, a robot?

I know we've had some victories recently for web scraping like the hiQ case, but I don't remember it it covered bypassing "technical protections" in a similar way that copyright laws are applied.

GPUboyOP1y ago

blacksmith_tb1y ago

1: https://nopecha.com/

GPUboyOP1y ago

We solved unlimited agents/executions or you can run a local model, so you can use Open Agent Studio to do unlimited captchas

GPUboyOP1y ago· 2 in thread

prng20211y ago

This sounds amazing! Can you explain how semantic targeting works?

GPUboyOP1y ago

Thanks! Sure:

"Only scrape the funny tweets" or "Only scrape the tweets with the word Cheat Layer" or "If there is Cheat Layer in any tweet, say only 'yes'"

It took a year+ to build a multimodal model that calculated the probability each element matched the intent, but now modern models like Gemini can do this(GPT-4 can't target precise coordinates).

So if your target is "post button" even if twitter changes the color, moves, or even changes the word "post" the automation can still find it on the screen and click it.

We're pretty sure all automation tools will use this eventually in the future, since it seems like a no-brainer.

Here's more details:https://docs.cheatlayer.com/fundamentals/agentic-process-aut...

cowthulhu1y ago· 1 in thread

GPUboyOP1y ago

Thanks for the feedback.

mikewang1y ago· 1 in thread

This is super cool. I have to say this. I love your concepts.would love to see it better.

BTW, do you open-sourced all of it or just parts. Because I see your github repo does not update for a month. can I run this local?

GPUboyOP1y ago

The chrome extension is required to target off-screen elements, using a websocket server locally. This allows a 100% replacement for previous selector strategies.

If you are interested in contributing email me at rohan@cheatlayer.com and I can help get you set up.

bzmrgonz1y ago· 1 in thread

"open-source uipath alternative" is quite a bold claim amigos!! I hope you're right, because the world need a good open source RPA in nocoloco (no-code/low-code).

GPUboyOP1y ago

FractalHQ1y ago· 1 in thread

What is RPA

mmerlin1y ago

Robotic Process Automation (adaptable software automation of a previously human-controlled process)

j / k navigate · click thread line to collapse