Helium: Lighter Web Automation with Python (opens in new tab)

(github.com)

199 pointsmherrmann1y ago49 comments

49 comments

37 comments · 17 top-level

wokwokwok1y ago· 5 in thread

How can a wrapper around selenium be lighter than it?

A wrapper around an API is by definition heavier (more code, more functions) than using the lower level api.

It’s not using less resources.

It’s not faster (it has implicit waiting).

It’s not less code; it’s literally a superset of selenium?

Feels like a “selenium framework” is more accurate than light weight web automation?

Anyway, there’s no fixing automation tests with fancy APIs.

No matter what you try to do, if people are only interested in writing quick dirty scripts, you’re doomed to a pile of stupid spaghetti no matter what system or framework you have.

If you want sustainable automation, you have to do Real Software Engineering and write actual composable modules; and you can do that in anything, even raw selenium.

So… I’d be more interested if this was pitched as “composable lego for building automation” …

…but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.

That’s nice for getting started; but getting started is not the problem with automation tests.

It’s maintaining them.

mherrmannOP1y ago

Its use can be lighter. That is, the wrapper can be easier to use.

Helium helps with maintaining automation tests as well. click("Compose") is infinitely more maintainable than document.getElementById("eIu7Db").click(). (I just took this example from Gmail's web interface.)

n144q1y ago

That's just some superficial changes that often lead to confusion and other negative consequences down the road, especially when not handled carefully.

I would much rather directly rely on Selenium's stable APIs than someone else's wrapped APIs that is opionated and could be incomplete, incorrect, outdated and potentially unmaintained someday. There are always much more resources put into Selenium than these add-ons.

If I really want, I can choose a few APIs that I actually use and wrap them within my codebase. That's more reliable than this.

1 more reply

wokwokwok1y ago

How do you compose low level operations like “click here” into composable modules like:

loginAsUser(user)

id = createBooking(user)

loginAsAdmin()

approveBooking(id)

Is it the same as selenium? Do whatever you want your self?

That’s what I’m talking about. Unless you have high level composable modules that let you express high level test activities then your tests will always fall apart.

The syntax of the low level operations doesn’t matter because you will never ever care about a click(“compose”).

That’s not a test.

A test might be:

createEmail()

attachFile(…)

… whatever your bespoke business requirements are.

Having fancy wrappers?

Is it nicer? Sure.

Does it meaningfully improve the tests, maintaining tests?

Nope.

Because at the end of the day the low level operations will be bespoke, nasty, messy and different for each website; that’s why you wrap them up in functions and compose them.

At least, in my experience; this looks a lot like cypress; a high level set of operations with sensible defaults for easy tasks.

…but, practically, I’m skeptical that hiding the low level nasty details actually makes them go away; it’s smoothing them over for the “happy path”; but automation tests are like 90% edge cases.

> It’s use can be lighter

I don’t think that’s the generally accepted meaning of a light weight framework.

…but eh, fair enough. I understand what you mean.

2 more replies

pryelluw1y ago

> but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.

“Lighter” may be used as an alternative adjective to the word easy or easier. Your post, which comes off as very rude, misses the point of how the project is marketed.

At least the OP did not call it Python automation for humans …

wokwokwok1y ago

> “Lighter” may be used as an alternative adjective to the word easy or easier

That is, again, not common usage, there’s a word for easier to use; it’s “easier”. but whatever. It doesn’t matter; it’s just branding.

My point however, is that making easy to use frameworks for test automation is fundamentally misguided, and the responses like “try it, you’ll be amazed it makes all the problems go away” is the type of “drinking kool aid” that’s displays a deep lack of understanding of the problem space.

Doing easy things does not solve doing hard things; not here. Not in go. Not in rust. Not ever.

So, my point was (and is):

How does this address doing hard things because as someone who is familiar with this space and has tried it, I can’t see anything that helps with the hard things and no one who is heavily invested in automation realllly cares about doing easy things.

We can already do easy things

Another way of doing easy things is like using prettier or not; it’s a style preference.

So, is that what this is?

Selenium with a function calling style preference, or something that actual helps building automation?

There’s nothing wrong with making tools that make superficial cosmetic changes to the way you do things.

…but, that’s not how the project is marketed; as, at least, I’ve understood it.

quickvi1y ago· 3 in thread

for lightweight automation outside the browser:

https://github.com/elyase/screenium

oulipo1y ago

Very cool! Could be a kind of open-source, text-based (eg recipes are .md with instructions) version of KeyboardMaestro!

I'd love to see such an "open automation" format (could even be more general than pure software, could also automate your IoT or whatever, through extensions)

eg you could have a file "Type my bank login password" for bank websites which doesn't let you use keyboard input but force you to click on stuff, like a self-documented script using .md with code

    # Type my bank login password
    
    ## Trigger
    ```trigger:hotkey
    key: cmd+l
    filter: frontmost-app=Chrome and chrome.tab.url=~mybank.com/login
    ```
    
    ## Deps
    ```ensure-deps
    shell-runner>=1.*
    screen-ocr>=1.*
    python-runner>=1.*
    ```
    Ensure that my system has the proper extensions for the framework, to run all tasks
    
    ## What it does
    This automation lets me input my password in a "click-only" input for my lousy bank UI
    
    ```run:shell /bin/sh:capture-output=password
    echo $(op --vault personal --site mybank)
    ```
    (the above runs the shell script and captures the output as a "password" variable I can use in other scripts below)
    
    ```run:screen-ocr:capture-output=ocr-result
    window:chrome
    ```
    
    ...go on scripting using typescript/python to locate the numbers in the ocr-result

okso1y ago

macOS only (uses Apple Vision framework)

erikcw1y ago

I've used SikuliX[0] in the past for similar purposes. Unfortunately the author hasn't had much time to maintain it recently.

[0] https://github.com/RaiMan/SikuliX1

languagehacker1y ago· 2 in thread

Importing * is universally discouraged by most Python linters and best practice docs. You can always "import helium as h" if you're looking to type less.

This looks largely like common workarounds that most people will write using Python-based browser automation. Most of the time, we accept that those capabilities aren't there by default because they are not explicit enough and can result in bugs and undefined behavior even when the elements that we expect to be on the page are actually there.

Given the adage "explicit is better than implicit", I worry that a layer like this might create more trouble than it's worth for the sake of readability. When we get into the nitty-gritty of browser automation, it might just make it harder to debug than going straight to Selenium or Playwright.

mherrmannOP1y ago

Importing * is universally discouraged by most Python linters and best practice docs.

Yup, I would never do it in a .py file. But I do it all of the time in the interpreter, which is what the video shows.

It sounds like you haven't tried Helium yet. I think you should, and see for yourself whether the trade-off you talk about actually exists.

Given the adage "explicit is better than implicit", I worry that a layer like this might create more trouble than it's worth for the sake of readability.

You could make the same argument about using C / assembly instead of Python. I suggest you try Helium before making statements about the "trouble" it may create. I believe you will find that there is no trouble.

shepherdjerred1y ago

It would be much more useful if you tried out the tool before criticizing it

Or, if you have tried it, if you could explain why you don’t think the tool makes the right tradeoffs

Adages like “explicit is better than implicit” are incredibly context dependent, otherwise we’d all be writing assembly

wslh1y ago· 2 in thread

How does it compare with the "usual suspects"? I mean Playwright, Selenium, Cypress, and Puppeteer.

mherrmannOP1y ago

It's more high-level. Instead of saying "click element with ID xv9873", you can say "click Download".

Yossarrian221y ago

That's how Playwright works too

1 more reply

__mharrison__1y ago· 2 in thread

Thanks for posting. All this AI has been interested in scraping personal sites.

mherrmannOP1y ago

I have actually been wondering whether Helium's more high-level API lends itself well for use by AI.

grantc1y ago

This. Seems like you could wedge this and a model into a scrappy version of computer use for browsers.

Fwiw, thanks for contributing this. It seems apt for a number of repetitive things I probably do dozens of times a week and don't even notice as cruft anymore.

I'm not sure why there were such hot takes on what this is or isn't. Maybe Big Selenium crisis actors? You made something cool, you shared it w/ world -- that should be the system prompt for people posting about it in my kinder world of things.

hugs1y ago· 1 in thread

Selenium project founder here. (Hi!) Thanks for all your work on this project. Lots of negativity around here these days, but just wanted to say thanks. The functional style of Helium's API reminds me a lot of Selenium's original API when it was 100% JavaScript (aka Selenium 1 aka Selenium Core) back in 2004.

(Functional style: "method(thing)" vs object oriented style: "thing.method()")

We mostly abandoned the functional style when we merged with the WebDriver project (aka Selenium 2), but that functional style still lives on in the Selenium IDE record/playback tool.

That is all to say, there are fans of many different styles for automation APIs. No single API will please everyone. (But I personally like the simpler, functional style, fwiw!)

Side-note: This is also why I'm a fan of the Nim programming language. "method(thing)" and "thing.method" are supported syntax for literally the same thing. For others new to the idea, the fancy term for this is "Uniform Function Call Syntax".

lblume1y ago

UFCS is great. I really wish more languages would support something similar, although both pipe operators (thing |> function1 |> function2) or Rust's proposal for thing.(function) seem to also satisfy the syntactic ideal.

giis1y ago· 1 in thread

Looks nice. Is it possible start_chrome() with specific chrome browser profile name or re-use existing open firefox/chrome browser session and launch a new tab with specific domain?

mherrmannOP1y ago

I don't know. Please check if Selenium supports this and if yes, use Helium's set_driver(...) or options argument to start_chrome(...).

bryanrasmussen1y ago· 1 in thread

How easy is it to detect that this is automation as opposed to a real user? I suppose probably pretty easy, so not sure if it is useful if I want to automate the web for things I do every day as I would really be running the risk of turning off access to those things if they determined I am automating them.

bdcravens1y ago

This is a wrapper on top of Selenium, so unless the library implements additional techniques to improve stealth, it's on par with Selenium's detectability (which as you pointed out can be detected easily enough)

Havoc1y ago· 1 in thread

That looks useful. How does it know which box is the user field? Just read label and assume the one below that or to the right of the label?

mherrmannOP1y ago

Pretty much, yes. And if there are multiple, then it uses the matching element closest to the last one it interacted with. Much like a human.

bg241y ago· 1 in thread

Nice work! I looked at the cheatsheet, and it is not obvious to me how to go through two factor authentication during login.

mherrmannOP1y ago

Thanks! Helium only automates browsers. If the 2FA is happening in the browser, then you can use Helium to automate the flow. If it's outside, then that part cannot be handled by Helium.

crazymoka1y ago· 1 in thread

Can it be headless?

mherrmannOP1y ago

Yes: start_chrome(headless=True)

nkrisc1y ago

Having done some ad-hoc, temporary automation with Selenium in the past (to help fellow, less technically-inclined designers) I wish I had this at the time.

Looks like a nice, almost natural language-like API around what is otherwise a quite cumbersome API.

fermigier1y ago

"We shut down the company at the end of 2019 and I felt it would be a shame if Helium simply disappeared from the face of the earth."

I appreciate the effort. Thank you M. Hermann.

1 more reply

bilater1y ago

Nice - I can see some cool agentic flows created using this. A thing I want to look into is creating a sandbox instance (Ubuntu?) and letting an agent do its thing. Could be collecting data or answering questions and I can pull up the window to check in from time to time. It'll be like having an assistant.

edm0nd1y ago

Very neat!

Rolling in a captcha solving service like DeathByCaptcha or AntiCaptcha and you got yourself a quick and easy script that can do anything on any website regardless of captchas.

slt20211y ago

Thank you for sharing this project, this is really good

Byte641y ago

This is so cool

j / k navigate · click thread line to collapse

49 comments

37 comments · 17 top-level

wokwokwok1y ago· 5 in thread

How can a wrapper around selenium be lighter than it?

A wrapper around an API is by definition heavier (more code, more functions) than using the lower level api.

It’s not using less resources.

It’s not faster (it has implicit waiting).

It’s not less code; it’s literally a superset of selenium?

Feels like a “selenium framework” is more accurate than light weight web automation?

Anyway, there’s no fixing automation tests with fancy APIs.

No matter what you try to do, if people are only interested in writing quick dirty scripts, you’re doomed to a pile of stupid spaghetti no matter what system or framework you have.

If you want sustainable automation, you have to do Real Software Engineering and write actual composable modules; and you can do that in anything, even raw selenium.

So… I’d be more interested if this was pitched as “composable lego for building automation” …

…but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.

That’s nice for getting started; but getting started is not the problem with automation tests.

It’s maintaining them.

mherrmannOP1y ago

Its use can be lighter. That is, the wrapper can be easier to use.

n144q1y ago

That's just some superficial changes that often lead to confusion and other negative consequences down the road, especially when not handled carefully.

If I really want, I can choose a few APIs that I actually use and wrap them within my codebase. That's more reliable than this.

1 more reply

wokwokwok1y ago

How do you compose low level operations like “click here” into composable modules like:

loginAsUser(user)

id = createBooking(user)

loginAsAdmin()

approveBooking(id)

Is it the same as selenium? Do whatever you want your self?

That’s what I’m talking about. Unless you have high level composable modules that let you express high level test activities then your tests will always fall apart.

The syntax of the low level operations doesn’t matter because you will never ever care about a click(“compose”).

That’s not a test.

A test might be:

createEmail()

attachFile(…)

… whatever your bespoke business requirements are.

Having fancy wrappers?

Is it nicer? Sure.

Does it meaningfully improve the tests, maintaining tests?

Nope.

Because at the end of the day the low level operations will be bespoke, nasty, messy and different for each website; that’s why you wrap them up in functions and compose them.

At least, in my experience; this looks a lot like cypress; a high level set of operations with sensible defaults for easy tasks.

> It’s use can be lighter

I don’t think that’s the generally accepted meaning of a light weight framework.

…but eh, fair enough. I understand what you mean.

2 more replies

pryelluw1y ago

> but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.

“Lighter” may be used as an alternative adjective to the word easy or easier. Your post, which comes off as very rude, misses the point of how the project is marketed.

At least the OP did not call it Python automation for humans …

wokwokwok1y ago

> “Lighter” may be used as an alternative adjective to the word easy or easier

That is, again, not common usage, there’s a word for easier to use; it’s “easier”. but whatever. It doesn’t matter; it’s just branding.

Doing easy things does not solve doing hard things; not here. Not in go. Not in rust. Not ever.

So, my point was (and is):

We can already do easy things

Another way of doing easy things is like using prettier or not; it’s a style preference.

So, is that what this is?

Selenium with a function calling style preference, or something that actual helps building automation?

There’s nothing wrong with making tools that make superficial cosmetic changes to the way you do things.

…but, that’s not how the project is marketed; as, at least, I’ve understood it.

quickvi1y ago· 3 in thread

for lightweight automation outside the browser:

https://github.com/elyase/screenium

oulipo1y ago

Very cool! Could be a kind of open-source, text-based (eg recipes are .md with instructions) version of KeyboardMaestro!

I'd love to see such an "open automation" format (could even be more general than pure software, could also automate your IoT or whatever, through extensions)

eg you could have a file "Type my bank login password" for bank websites which doesn't let you use keyboard input but force you to click on stuff, like a self-documented script using .md with code

    # Type my bank login password
    
    ## Trigger
    ```trigger:hotkey
    key: cmd+l
    filter: frontmost-app=Chrome and chrome.tab.url=~mybank.com/login
    ```
    
    ## Deps
    ```ensure-deps
    shell-runner>=1.*
    screen-ocr>=1.*
    python-runner>=1.*
    ```
    Ensure that my system has the proper extensions for the framework, to run all tasks
    
    ## What it does
    This automation lets me input my password in a "click-only" input for my lousy bank UI
    
    ```run:shell /bin/sh:capture-output=password
    echo $(op --vault personal --site mybank)
    ```
    (the above runs the shell script and captures the output as a "password" variable I can use in other scripts below)
    
    ```run:screen-ocr:capture-output=ocr-result
    window:chrome
    ```
    
    ...go on scripting using typescript/python to locate the numbers in the ocr-result

okso1y ago

macOS only (uses Apple Vision framework)

erikcw1y ago

I've used SikuliX[0] in the past for similar purposes. Unfortunately the author hasn't had much time to maintain it recently.

[0] https://github.com/RaiMan/SikuliX1

languagehacker1y ago· 2 in thread

Importing * is universally discouraged by most Python linters and best practice docs. You can always "import helium as h" if you're looking to type less.

mherrmannOP1y ago

Importing * is universally discouraged by most Python linters and best practice docs.

Yup, I would never do it in a .py file. But I do it all of the time in the interpreter, which is what the video shows.

It sounds like you haven't tried Helium yet. I think you should, and see for yourself whether the trade-off you talk about actually exists.

Given the adage "explicit is better than implicit", I worry that a layer like this might create more trouble than it's worth for the sake of readability.

shepherdjerred1y ago

It would be much more useful if you tried out the tool before criticizing it

Or, if you have tried it, if you could explain why you don’t think the tool makes the right tradeoffs

Adages like “explicit is better than implicit” are incredibly context dependent, otherwise we’d all be writing assembly

wslh1y ago· 2 in thread

How does it compare with the "usual suspects"? I mean Playwright, Selenium, Cypress, and Puppeteer.

mherrmannOP1y ago

It's more high-level. Instead of saying "click element with ID xv9873", you can say "click Download".

Yossarrian221y ago

That's how Playwright works too

1 more reply

__mharrison__1y ago· 2 in thread

Thanks for posting. All this AI has been interested in scraping personal sites.

mherrmannOP1y ago

I have actually been wondering whether Helium's more high-level API lends itself well for use by AI.

grantc1y ago

This. Seems like you could wedge this and a model into a scrappy version of computer use for browsers.

Fwiw, thanks for contributing this. It seems apt for a number of repetitive things I probably do dozens of times a week and don't even notice as cruft anymore.

hugs1y ago· 1 in thread

(Functional style: "method(thing)" vs object oriented style: "thing.method()")

We mostly abandoned the functional style when we merged with the WebDriver project (aka Selenium 2), but that functional style still lives on in the Selenium IDE record/playback tool.

That is all to say, there are fans of many different styles for automation APIs. No single API will please everyone. (But I personally like the simpler, functional style, fwiw!)

lblume1y ago

giis1y ago· 1 in thread

Looks nice. Is it possible start_chrome() with specific chrome browser profile name or re-use existing open firefox/chrome browser session and launch a new tab with specific domain?

mherrmannOP1y ago

I don't know. Please check if Selenium supports this and if yes, use Helium's set_driver(...) or options argument to start_chrome(...).

bryanrasmussen1y ago· 1 in thread

bdcravens1y ago

Havoc1y ago· 1 in thread

That looks useful. How does it know which box is the user field? Just read label and assume the one below that or to the right of the label?

mherrmannOP1y ago

Pretty much, yes. And if there are multiple, then it uses the matching element closest to the last one it interacted with. Much like a human.

bg241y ago· 1 in thread

Nice work! I looked at the cheatsheet, and it is not obvious to me how to go through two factor authentication during login.

mherrmannOP1y ago

Thanks! Helium only automates browsers. If the 2FA is happening in the browser, then you can use Helium to automate the flow. If it's outside, then that part cannot be handled by Helium.

crazymoka1y ago· 1 in thread

Can it be headless?

mherrmannOP1y ago

Yes: start_chrome(headless=True)

nkrisc1y ago

Having done some ad-hoc, temporary automation with Selenium in the past (to help fellow, less technically-inclined designers) I wish I had this at the time.

Looks like a nice, almost natural language-like API around what is otherwise a quite cumbersome API.

fermigier1y ago

"We shut down the company at the end of 2019 and I felt it would be a shame if Helium simply disappeared from the face of the earth."

I appreciate the effort. Thank you M. Hermann.

1 more reply

bilater1y ago

edm0nd1y ago

Very neat!

Rolling in a captcha solving service like DeathByCaptcha or AntiCaptcha and you got yourself a quick and easy script that can do anything on any website regardless of captchas.

slt20211y ago

Thank you for sharing this project, this is really good

Byte641y ago

This is so cool

j / k navigate · click thread line to collapse