We fixed f-string typos in popular Python repos (opens in new tab)

(highertier.com)

139 pointsrikatee4y ago145 comments

145 comments

We've been one of 666 repos, and I'm not too happy of having our repo used as advertising space. Some thoughts:

- I'm happy to receive fix-a-typo PRs from human users. In this case the other side demonstrated that they care by putting in a bit of manual effort, and a small PR often paves the way towards larger contributions. I also know that open source beginners get really excited about their first small contributions, and I'm honestly happy to support that.

- In contrast, the marginal effort for bot PRs is ~0. It's very easy to generate a small amount of work for a lot of people, and the nice side effect is that the bot's platform is advertised everywhere. As a maintainer, I have never given consent to this and I have no choice to opt out.

We are very happy users of some GitHub bots, but I feel it needs to be an active adoption decision by the maintainer. If you want to pitch me your service you may send me an unsolicited email, but don't use our public space to advertise your product without asking.

Edit: I don't want to be too harsh to OP here - at least they pointed out a small but valid issue in our case. I very much appreciate their apology at https://news.ycombinator.com/item?id=31210245

TameAntelope4y ago

I just... think you should reconsider your stance on this. If you made a mistake in a public repo and someone else caught it (via scan of your repo or otherwise), it's a pretty bad look to be anything but grateful at that point, PR benefits for the bot aside.

Sebb7674y ago

The problem with scanners is that they usually have a pretty high false positive rate. When automatically opening the PR, they are basically putting the human review part on the maintainer (burdening them with additional and possibly useless work) while also using their repo as advertising space without consent. When the scan goes wrong and has a lot of false positives or it looks like they just got lucky, it's easy for a maintainer to feel like most of the cost was handed to them, while most of the upsides (like QA and brand recognition) are reaped by the bot. When a human opens the PR, you at least know that they valued your time and checked the changes beforehand, even if it's based on the results of the bot and contains the same errors.

Now, if the bot catches an actual error and improves the software, the result is obviously net good and the tad of free advertising is deserved. But it can easily feel like a PR campaign paid for with carelessly annexed maintainer time and in quite a few cases, it simply is.

jhardy544y ago

> The problem with scanners is that they usually have a pretty high false positive rate.

Did that happen in the example being discussed in this thread?

charlieyu14y ago

How high is the false positive rate? I would say even at 80%, the bots at least have found enough possible bugs that worth attentions that wouldn’t be found by human review only

sieabahlpark4y ago

The issue they had is being part of the advertisement, not that the bot did the work.

Everyone is out for notoriety and street cred instead of just doing good for the community.

omegalulw4y ago

I understand the sentiment but you should be judging the PR, not the source. Ask yourself: would you have happily accepted the same PR that the bot sent if it came from a human?

By all means, I am not against having bots identify themselves properly, my point is that "effort from bot PR is ~0", "it advertises their platform" are simply not the right reasons to judge this situation by.

dogecoinbase4y ago

Ask yourself: would you treat a PR differently if it came from a regular, trusted contributor, or some random person (or bot)?

bostonsre4y ago

Sure you would probably treat it differently but isn't it being elitist and harmful to the open source community in general to outright shoot down or discourage any PR from a lesser known or unknown source if it is a good PR? I think we should encourage novices to contribute and we shouldn't be hostile to them so that they can get past the novice phase and become trusted contributors. If a bot produces valid helpful and well formed PRs, why would you discriminate against them completely when they improve your codebase?

1 more reply

netheril964y ago

I would if the content of the PR were complicated. Not in this case.

Mo34y ago

> I have never given consent to this and I have no choice to opt out.

You have a public repository on GitHub. You are free to switch it to private, but otherwise this absolutely illogical. No one needs your consent to submit PRs to and highlight a public repository.

This is equivalent to having a website and then getting angry about linking to it, or putting your artwork up for public viewing and then getting angry at someone pointing out a small tear in the fabric.

Actually, now that I think of it, it’s better comparable to someone bringing in a handheld scanner with a company name on it, scanning the artwork and then pointing out the tear.

Which is still totally fine. You’ve given implied consent by making it available to the public. You have decided to make it possible for the public to view it, criticize it and link to it.

Quote Wikipedia,

> Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, *study*, change, and distribute the software and its source code to anyone *and for any purpose*

Case closed.

throwaway2904y ago

> Actually, now that I think of it, it’s better comparable to someone bringing in a handheld scanner with a company name on it, scanning the artwork and then pointing out the tear.

No, it's more like somebody sending to your lab, uninvited, an impersonal inspection bot with another company's branding on it, which doesn't only disclose potential issues to you but advertises them across the whole cyberspace.

And in case of OSS this lab may be my tiny garage where me and friends tinker on stuff.

Choosing to make the results of our passion or work free for all to study and use should not come with a liability of having to deal with hordes of such bots.

Mo34y ago

Only if you establish your lab in a tent on the street and put a sign that says "for public display" on it.

> And in case of OSS this lab may be my tiny garage where me and friends tinker on stuff.

That's not OSS. OSS would be leaving the garage door open, putting your garage on Google Maps and freely allowing anyone to walk in and see what you're doing. That's OSS.

Then getting angry about it is what you and OP are doing.

1 more reply

grey_earthling4y ago

> No, it's more like somebody sending to your lab, uninvited, an impersonal inspection bot with another company's branding on it, which doesn't only disclose potential issues to you but advertises them across the whole cyberspace.

GitHub isn't your lab. It's Microsoft's lab. (They just rent out space free of charge.)

1 more reply

dataflow4y ago

Sounds like we need a robots.txt for GitHub repos.

tus6664y ago

Isn't having a public github repo consent?

dataflow4y ago

Is having a publicly reachable email address consent to receiving unsolicited emails? Is having a public postal address consent to receiving mailed advertisements? Legally, yeah; morally, less so.

tus6664y ago

Technically they are forms of abuse, a PR for a valid bug less so.

1 more reply

IshKebab4y ago

I feel the same way about those bots that tell you about insignificant security vulnerabilities in some project you abandoned. It's basically spam.

That said, this does seem like it is a bit more useful. As long as they actually read the changes and make sure they aren't false positives. Which I'm guessing they didn't do for 666 repos.

b6dybuyv4y ago

> As long as they actually read the changes and make sure they aren't false positives. Which I'm guessing they didn't do for 666 repos.

In the article they say that "really a bot found the problem and made the PR, but really a human developer at Code Review Doctor did triage the issue before the PR was raised)".

colejohnson664y ago

> I feel the same way about those bots that tell you about insignificant security vulnerabilities in some project you abandoned. It's basically spam.

If you "archive" your repos, dependabot and friends won’t bother you.

Or, you could just disable security alerts in your repo's settings.

oefrha4y ago

Dependabot isn’t the only source of vulnerability fatigue, there are plenty of “researchers” who would spam your active projects about pointless “vulnerabilities”. For instance, I recently got one about a parsing issue in gmp from a human user, who probably found it by scanning PyPI. I’m not touching anything adjacent to the supposedly vulnerable codepath, and the fix isn’t even in a gmp release, meaning I would have to carry a patch if I were to “fix” it. I still responded amicably, but I was not happy.

1 more reply

scotty794y ago

It's your repo and your choice. You can reject the PR.

Your repo is public so you can't prevent people and bots from looking into it and having opinions on it (even public ones).

HL33tibCe74y ago

> > We may be looking too deep into this but it seems like many developers think when string concatenation occurs it’s enough to declare the first string as an f-string and the other strings are turned into f-strings by osmosis. It doesn’t. We’re not suggesting this is the case for all developers that accidentally did this error, but interesting nonetheless.

I highly doubt that people believed that f-strings worked this way. Far more likely is that, for example, the expression started as one line, then got split onto two, or some such similar scenario.

johnny_castaway4y ago

This may be also caused by confusing syntax highlighting in some editors, for example in VSCode [1]

The variable in the second string gets highlighted (with slightly different color, but still) because it would still work with `str.format()`. GitHub doesn't seem to do this.

[1] https://imgur.com/a/9KGWVG0

totony4y ago

Yeah I got bitten by the same-color syntax highlighting a few times.

_8j504y ago

You'd be surprised, people who expect python to be "smart" and "figure it out" might think that way.

groestl4y ago

Well, it's not completely unlogical.

'a' is str

b'a' is bytes

f'something' might be a separate f-str-type too?

1 is an int

1.2 is a float

(1.2 + 1) is a float

GeorgeTirebiter4y ago

Indeed, if "{value} is bad" can be automatically f-stringed by an external program automatically --- then why can't Python do this automatically -- so we can get rid of the f-string type as a required explicit declaration? After all, we don't specifically add a type to a number like 42 or 3.14159 --- those are implicitly 'int' and 'float' types.

I would use such a feature, as I always use f-strings when formatting.

4 more replies

HL33tibCe74y ago

As a beginner, maybe. But this is code in some of the biggest open-source Python repos in existence. Probably written by someone with a reasonable level of Python expertise.

genewitch4y ago

... and yet?

groestl4y ago

I think what's happening is that people assume there's type coercion going on here.

jldugger4y ago

I wonder if an autoformatter like black is at play here.

saila4y ago

Black doesn't split strings, and I doubt they'd choose to use concatenation if they did.

hathawsh4y ago

I kind of wish Black would split and join literal strings. I've seen several times when Black converted code like this:

    raise SomeError(
        "Explanation... "
        "yet more explanation"
    )

To this:

    raise SomeError(
        "Explanation... " "yet more explanation"
    )

That just looks odd. Black is fantastic, but not perfect.

d0mine4y ago

Black does split strings if configured and it does use implicit concatenation of Python string literals.

1 more reply

xigoi4y ago

Black doesn't change the semantics of code.

gus_massa4y ago

I don't mind it's a bot, and I really appreciate that apparently a human made the final review before sending the PR. But I don't like that the tittle of the commit is:

> Fix issue probably-meant-fstring found at https://codereview.doctor

I expect a more neutral title for a commit, something like

> Fix fstring in <name-of-file>

Each maintainer/project has their own (weird) rules about titles, and if any other files must log the changes, and regression test, and whatever they like. But I think no maintainer/project expect to see the name of the author in the commit tile.

bvinc4y ago

So they checked 666 python repositories and fixed bugs in 69 of them. Interesting choice of numbers.

defterGoose4y ago

...doing the devil's work.

InfiniteRand4y ago

Fixing F-strings

racl1014y ago

Nice!

saagarjha4y ago

It’s a nice coincidence.

memco4y ago

The article links to some docs for the logging module here: https://docs.python.org/3/howto/logging.html#optimization asserting that f-strings are less optimal but the docs do not say that they do not optimize our the expression evaluation of f-strings: only that the logging module tried to perform evaluation as late as possible: where is the f-string described as suboptimal?

Relatedly the logging optimization suggests setting: raiseExceptions to false for production logging code: where is that set? On the logger, handler or something else?

bobbiechen4y ago

I was also confused by the expression evaluation thing. Reading between the lines, it seems like

    logger.debug("hello %s", foo)

may be better than

    logger.debug(f"hello {foo}")

in the case when loglevel is higher than debug. In the first version, the final string does not have to be computed, while in the second version, we might construct the string and then do nothing since the loglevel is excluded.

bbayles4y ago

The first is better also because you can do things with loggers other than print out their contents. For example, suppose you had a statement like:

    logger.debug('Database error: %s', error_message)

You probably have a logging handler that did the normal string. But you can also have one that keeps a count of how many `Database error: %s` hits there are (as opposed to `Network error: %s`) there are over time. Doing the string substitution would break this aggregation.

masklinn4y ago

That's exactly it.

Although this becomes more complicated because printf-style string formatting is not free (though it's the cheapest of all methods save fstrings if I remember correctly), and because python does not support lazy parameters if `foo` is a non-trivial expression odds are good it will far outcost either formatting.

jrootabega4y ago

You're also able to add additional log-specific processors to the log record in the first case.

aib4y ago

> where is the f-string described as suboptimal?

I guess it's implicit in that f-strings, as arguments, will be evaluated before the logging function can even run whereas `debug("...", heavy_obj)` will avoid a potentially expensive `str(heavy_obj)` (or whatever the string conversion warrants.)

As for raiseExceptions, I'm not sure it's for optimization. It looks like an old sanity check for bad logging configurations.

nimish4y ago

The "Use logging's interpolation" warning has always annoyed me. If logging is in your hot path, that might be an issue. Me f-string interpolating some info logs that run once for convenience is not.

Low value junk like this is not helpful.

readthenotes14y ago

I find it ironic that the article points out that relying on error from humans to find errors is something of a hit or miss proposition and suggests that automating error finding is an appropriate course instead of making it less likely to make the error in the first place.

For example, I wonder how many errors would have been found if the definition of a format string was the default? That is, how many times would people have written something like "hello {previously-defined-variable}" and not meant to substitute the value of that previously defined variable at runtime?

MauranKilom4y ago

I don't think this makes sense. Plain strings and format strings are not interchangeable, and using one where the other was meant is probably a bug.

Would you expect that a user input like "{secret} please" is interpolated? If so, we hopefully agree that this would blow major security holes into any python script processing untrusted user input. And if not... Why not?

kgeist4y ago

>Would you expect that a user input like "{secret} please" is interpolated?

That's basically what the recent log4j security vulnerability was all about. "Helpfully" interpolating logs by default.

boxed4y ago

Look up how this works in Swift. They only have one string. No raw strings or f strings. Yet they have all the power of all three python string types and less syntax. It's very nice.

minitech4y ago

Swift does have raw strings (the #"extended delimiter"# syntax).

1 more reply

asvitkine4y ago

If you only make it work with string literals (e.g. generate the underlying formatting logic at parse time), it wouldn't allow arbitrary inputs to be treated as f strings.

noobermin4y ago

The assumption I'm thinking they mean is to make formatting default and unformatted not default, for example, how "raw" strings were treated, escaped characters are replaced with the ascii code by default unless the string is raw, signified by an 'r' prefixed in front.

Sohcahtoa824y ago

Adding that behavior would break existing code that uses str.format, and Python tries to avoid breaking code between minor releases.

HL33tibCe74y ago

That’s not really a feasible solution in Python because that change would break a load of existing code.

mschuster914y ago

So what? Raise a deprecation notice, treat it as a fatal error in two or three years and that's it. PHP has been doing this for years now.

swatcoder4y ago

As someone who would like to be working on new, interesting things in 2-3 years rather than bringing old code into conformance with breaking changes, this attitude captures a worrisome trend in development.

On the one hand, it's great that we have platforms that innovate and improve and harden over time, but we're also facing a development culture where more and more time is spent servicing package/platform/language/OS changes that have no material impact on our own otherwise-mature projects.

It's worth being judicious about where breaking changes are applied, right?

1 more reply

HL33tibCe74y ago

We’re not talking about deprecating a feature here, we’re talking about the addition of behaviour that will break existing code, potentially in non-trivial and hard to debug ways, and in ways that could easily introduce security vulnerabilities.

capitalsigma4y ago

Just bump the major version number from 3 to 4, right? How long could that migration take?

noobermin4y ago

We've done this too many times and we've had enough pain, let's please proceed at a pace where we can worry about delivering our product and not updating formatted strings, thank you.

krisoft4y ago

> treat it as a fatal error

Did you think this through? What would you treat as a fatal error? How would the compiler know if a particular string is old style code wanting to print some characters between curly braces or new style code wanting to string interpolate a variable?

bornfreddy4y ago

Also see: Python 2 => 3 hell. Nobody wants to repeat that.

1 more reply

Too4y ago

Changing something as deeply rooted as the string type?

Python already went through exactly that disaster once before, when they changed the default string type from b””-strings to u””-strings. It took about 20 years for this transition to finally complete.

rglullis4y ago

PHP has also been responsible for the majority of exploited servers and misconfigured applications. Whatever they are doing it, I take it as a strong negative signal.

1 more reply

dekhn4y ago

Python does not do this. A change like that would require a major version number increment and the community would revolt.

Too bad we can't go back in time to 1996 or so.

swatcoder4y ago

To be fair, your suggestion might make for a more resilient default, but it's also a great way to leak data and add overhead for the default case. There are tradeoffs.

Someone4y ago

Not much overhead, I would think. We’re talking about literal strings in source code, not strings in general. It’s not much work to check those.

One thing that it would break is that strings read from files would be treated differently from those in source code, even those read from files that logically “belong” to the application (say config file)

I don’t think that’s an issue, though.

Also, in Swift "\(foo)" does string interpolation. I haven’t seen people complain it leaks data or makes Swift slow (but then, it’s not fast at compiling at all because of its rather complicated type inference)

JadeNB4y ago

> Also, in Swift "\(foo)" does string interpolation. I haven’t seen people complain it leaks data or makes Swift slow (but then, it’s not fast at compiling at all because of its rather complicated type inference)

I think that the claim is not that this leaks data in an absolute sense, but rather that changing the behaviour after people have come to rely on it will leak data from currently well behaving applications.

macspoofing4y ago

>...and suggests that automating error finding is an appropriate course instead of making it less likely to make the error in the first place.

You can't fix the syntax and standard lib of the language. It is what it is. Similarly, how many bugs would you prevent if Python had compiler support to catch those types of syntax (and type) errors.

Too4y ago

This is how bash works. Any string with a $ in it will be interpolated unless you double escape it. Also depending on if you use double or single quoted strings.

Spaces as list separator could also fall into this philosophical question of what makes most sense as string separators. Some times it is super convenient, until you have actual spaces in your string and it becomes a pita.

See also the yaml Norway problem for what happens when implicit goes over explicit.

It generates about the same amount of bugs, if not more, and would also end up with a code-review-doctor suggesting you to use /$ over $. In the end, regardless of syntax, a human always have to make the final call on whether interpolation is wanted or not.

boxed4y ago

This is how strings work in swift. It's a much superior system imo.

rikateeOP4y ago

what's also ironic is I left an easter egg in the code sample for how we downloaded the list of repositories and no one has noticed it yet.

dewey4y ago

> For science you can see the reactions here.

That link seems to be broken: https://github.com/issues?q=is%3Aissue+author%3Acode-review-...

I was actually surprised to read that people would ignore or be annoyed by a bot raising a valid PR that can be easily merged after a quick glance. What would be the reason for that?

TrickardRixx4y ago

Automated checking of potential bugs in f-strings is hard. There are lots of false positives. You can see some discussion around this kind of rule in pylint [0]. At the end of the day, the choice to run automated linting tools on a repo is up to the maintainers. Autogenerating PRs like this is incredibly noisy and comes off to me as a blatant advertisement for their "code review doctor" product.

[0] https://github.com/PyCQA/pylint/issues/5039

wbobeirne4y ago

> Autogenerating PRs like this

The article specifically mentions that they were not auto-generated,

"It was also interesting to see the reaction from open source developers to unsolicited pull requests from what looks like a bot (really a bot found the problem and made the PR, but really a human developer at Code Review Doctor did triage the issue before the PR was raised)"

2 more replies

zamadatix4y ago

In reactions they conveniently left out "false positives we still hadn't weeded out". On top of that it can be annoying to have bots making trivial PRs in their own format when you've got a well defined process for it. Lastly it was basically spamming an ad link for the service at the end of the PR comment - even if the other issues didn't come up it's not always well received to do that.

Looking at 1 bot it doesn't sound bad, when you have everyones bot doing this kind of stuff it can quickly become more of a nuisance than a help.

jjoonathan4y ago

You're assuming that the PR is valid, but a maintainer can't make that assumption. They have to do the thankless work to figure out the context and handle the fallout if they get it wrong. Let's look at who wins:

    * Small benefit to bot creator
    * Tiny benefit to project
    * Modest cost to maintainer

Waves of low-effort resume-padding commits are already a thing. Not a big problem, but bots clearly have the potential to multiply the small problem into a big problem.

I'm still open to the idea that bots could be a net win, because most projects really do have heaps of small simple mistakes lying around. I'm sympathetic to the maintainers though. They always seem to get the short end of the stick.

cinntaile4y ago

https://github.com/Qiskit/qiskit-terra/pull/7982

That guy was not happy. I do agree that it's basically advertising and that's annoying.

llbeansandrice4y ago

I would never tolerate ads in my commit history. That's ridiculous.

It's basically using open source repos as an advertising platform for their static-analysis bot.

If they want to offer services, they can reach out to the maintainers. This is different than a human opening a valid PR on a OS repo since the commit message includes an ad and now they're advertising on HN.

mhils4y ago

In our case OPs bot did not open a PR which could have been merged quickly, but filed an issue instead.

Forge364y ago

What I've found from doing similar types of changes.

1. It's hard to explain the impact to the application of the current problem. Thus it looks like a theoretical issue

2. Sometimes people rely on the bug for their code to work

3. Surprise work can be poorly received (ie: not the current priority)

brandonbloom4y ago

In addition to what others have already said, my own random sampling now shows quite a high false positive rate.

Waterluvian4y ago

I expect to see the entire gamut of possible reactions with a sufficient number of bot PRs. But in looking at 10 of them at random, I didn't find a single "negative response."

(I don't think ignoring it is invalid or wrong by any means, given there's so many reasons one might not engage in a timely manner, or at all, in the issues section or PRs. I don't monitor my repos issues because I just don't feel interested in supporting my code. Feel free to fork or ignore!)

vitus4y ago

Some negative reactions:

https://github.com/mitmproxy/mitmproxy/issues/5285

https://github.com/Qiskit/qiskit-terra/issues/7981

https://github.com/beetbox/beets/issues/4340

I do think those concerns are legitimate. (I also think more tooling is a good thing!)

Waterluvian4y ago

Thank you for sharing these links!

dekhn4y ago

I looked through all three. The first isn't really a complaint because the bot acted in good faith and found an error. In the second one they complained abiout a missing unsubscribe link (reasonable) and in the third one, the author should update their code so it doesn't create a variable named path, then a non-f-string that includes "{path}". I had to stare at the author's comment that it was a false positive for quite a bit to convince myself they were right.

1 more reply

VWWHFSfQ4y ago

Because this is basically just PR spam

dekhn4y ago

to me, well-intentioned systems wiht a high true positive rate and low false positive rate are welcome so long as they follow reasonable etiquette and norms, which this group seems to do.

mjs72314y ago

This was posted on Reddit earlier this week with similar negative responses: https://www.reddit.com/r/Python/comments/ubkvrd/10_of_the_66...

rikateeOP4y ago

FWIW, HN is much more positive (while also raising valid points that will be taken into account going forward)

snapetom4y ago

I'd like to add better technical discussion, too.

f7fg_u-_h4y ago

> After creating 69 pull requests the reaction ranged from:

> Annoyance that a bot with no context on their codebase was raising pull requests. A few accepted the bugs were simple enough for a bot to fix and merged the pull request, but a few closed the pull requests and issues without fixing. Fair enough. Open source developers are busy people and are not being paid to interact with what they think it just a bot. We’re not entitled to their limited time.

> Neutral silence. They just merged the PR.

> Gratitude. “thanks” or “good bot”.

I appreciate their self awareness about responses from maintainers.

malcolmgreaves4y ago

You can also use flake8 to find this, and even more, errors in Python code.

rikateeOP4y ago

flake8 does not currently support this check, as they are concerned about the false positives from "what if the string it later used in .format(...)"

However, Code Review Doctor is more of a "this MIGHT be a problem. have you considered..." rather than "it wrong"

fareesh4y ago

I like python although I don't use it too often. Would it be unfairly critical of me to say that this is the outcome of a bad design choice? Ideally languages should be designed in a way that a bug like this which is so widespread and easy to create, should be caught via some mechanism, either linting or some part of the process.

noobermin4y ago

The f-strings are a recent (may be not so recent now) addition to the language, so all the errors stem from it being "new" where people's reflexes / carefulness hasn't adjusted to them yet.

I think in addition to the suggestion for linters, updating IDE/editors to incorporate them would help. Syntax highlighting is the primary reason not terminating strings isn't that common of an error anymore, coloring it differently than a normal string might help (or may be it would make things ugly, I don't know).

Too4y ago

Compare to Typescript having a very small but significant difference. They use backticks instead of an f-prefix.

I think this minor difference eliminates all confusion of whether concatenating f strings and normal strings propagate. Same when you split an existing f-string in two because it became too long, there is no risk to forget a backtick on the second pet, in the same way you would with f-prefix, because if you do the closing tick doesn’t match the opening.

Linting the existing f-strings is, as shown by this bot, unfortunately very difficult.

nomel4y ago

To play devils advocate, that's a matter of perspective. There's nothing special about "{text}". It's just characters. One could claim that the default analysis should be "that's fine", with the option to make it special available with the "f". ;)

But, perhaps you're right, and the total number of bugs would be reduced with f-strings, but that would require making them default back in python 1.0.

The linter I use has warnings for things-that-look-like-f-strings on by default. But, some of my projects have f-string like text, so special text to tell the linter to ignore them are required all over the place.

polio4y ago

Most developers will require an arsenal of static analysis tools to achieve maximum productivity. Linters are an example of such a tool, but they don't exist as part of the language spec itself, AFAIK.

pabs34y ago

Its a shame Code Review Doctor isn't open source, then everyone could install it and use it on any code they write.

zikohh4y ago

Can I use code review doctor with gitlab? If not what options do I have?

ggm4y ago

Maybe catenation of an fString and a string should yield an fString by type promotion? String is morally "any" so it feels to me like a contextual narrowing of type.

carl_dr4y ago

What if you wanted to append a string with braces, could you? (I don’t know Python!)

Plus it would lead to subtle security vulns and other bugs. A contrived example :

f”{bot} spammed my repo saying” + “touch(‘{path}’) was wrong lol.”

Now my path var has been disclosed.

bikingbismuth4y ago

If you want braces in your f strings you double them up.

Ex:

n = "hn"

print(f"hello {n} your path is {{path}}")

Out: hello hn your path is {path}

SnowflakeOnIce4y ago

Strings and f-strings are not separate types in Python, sadly.

baisq4y ago

The comments on this submission are overwhelmingly negative. Why are the comments on PVS-Studio submissions, on the other hand, generally positive?

safwan4y ago

f-string does not work with GNU/gettext! It is also a common mistake that people make!

j / k navigate · click thread line to collapse

145 comments

mhils4y ago

We've been one of 666 repos, and I'm not too happy of having our repo used as advertising space. Some thoughts:

Edit: I don't want to be too harsh to OP here - at least they pointed out a small but valid issue in our case. I very much appreciate their apology at https://news.ycombinator.com/item?id=31210245

TameAntelope4y ago

Sebb7674y ago

jhardy544y ago

> The problem with scanners is that they usually have a pretty high false positive rate.

Did that happen in the example being discussed in this thread?

charlieyu14y ago

How high is the false positive rate? I would say even at 80%, the bots at least have found enough possible bugs that worth attentions that wouldn’t be found by human review only

sieabahlpark4y ago

The issue they had is being part of the advertisement, not that the bot did the work.

Everyone is out for notoriety and street cred instead of just doing good for the community.

omegalulw4y ago

I understand the sentiment but you should be judging the PR, not the source. Ask yourself: would you have happily accepted the same PR that the bot sent if it came from a human?

dogecoinbase4y ago

Ask yourself: would you treat a PR differently if it came from a regular, trusted contributor, or some random person (or bot)?

bostonsre4y ago

1 more reply

netheril964y ago

I would if the content of the PR were complicated. Not in this case.

Mo34y ago

> I have never given consent to this and I have no choice to opt out.

You have a public repository on GitHub. You are free to switch it to private, but otherwise this absolutely illogical. No one needs your consent to submit PRs to and highlight a public repository.

Actually, now that I think of it, it’s better comparable to someone bringing in a handheld scanner with a company name on it, scanning the artwork and then pointing out the tear.

Which is still totally fine. You’ve given implied consent by making it available to the public. You have decided to make it possible for the public to view it, criticize it and link to it.

Quote Wikipedia,

Case closed.

throwaway2904y ago

> Actually, now that I think of it, it’s better comparable to someone bringing in a handheld scanner with a company name on it, scanning the artwork and then pointing out the tear.

And in case of OSS this lab may be my tiny garage where me and friends tinker on stuff.

Choosing to make the results of our passion or work free for all to study and use should not come with a liability of having to deal with hordes of such bots.

Mo34y ago

Only if you establish your lab in a tent on the street and put a sign that says "for public display" on it.

> And in case of OSS this lab may be my tiny garage where me and friends tinker on stuff.

That's not OSS. OSS would be leaving the garage door open, putting your garage on Google Maps and freely allowing anyone to walk in and see what you're doing. That's OSS.

Then getting angry about it is what you and OP are doing.

1 more reply

grey_earthling4y ago

GitHub isn't your lab. It's Microsoft's lab. (They just rent out space free of charge.)

1 more reply

dataflow4y ago

Sounds like we need a robots.txt for GitHub repos.

tus6664y ago

Isn't having a public github repo consent?

dataflow4y ago

Is having a publicly reachable email address consent to receiving unsolicited emails? Is having a public postal address consent to receiving mailed advertisements? Legally, yeah; morally, less so.

tus6664y ago

Technically they are forms of abuse, a PR for a valid bug less so.

1 more reply

IshKebab4y ago

I feel the same way about those bots that tell you about insignificant security vulnerabilities in some project you abandoned. It's basically spam.

That said, this does seem like it is a bit more useful. As long as they actually read the changes and make sure they aren't false positives. Which I'm guessing they didn't do for 666 repos.

b6dybuyv4y ago

> As long as they actually read the changes and make sure they aren't false positives. Which I'm guessing they didn't do for 666 repos.

In the article they say that "really a bot found the problem and made the PR, but really a human developer at Code Review Doctor did triage the issue before the PR was raised)".

colejohnson664y ago

> I feel the same way about those bots that tell you about insignificant security vulnerabilities in some project you abandoned. It's basically spam.

If you "archive" your repos, dependabot and friends won’t bother you.

Or, you could just disable security alerts in your repo's settings.

oefrha4y ago

1 more reply

scotty794y ago

It's your repo and your choice. You can reject the PR.

Your repo is public so you can't prevent people and bots from looking into it and having opinions on it (even public ones).

HL33tibCe74y ago

I highly doubt that people believed that f-strings worked this way. Far more likely is that, for example, the expression started as one line, then got split onto two, or some such similar scenario.

johnny_castaway4y ago

This may be also caused by confusing syntax highlighting in some editors, for example in VSCode [1]

The variable in the second string gets highlighted (with slightly different color, but still) because it would still work with `str.format()`. GitHub doesn't seem to do this.

[1] https://imgur.com/a/9KGWVG0

totony4y ago

Yeah I got bitten by the same-color syntax highlighting a few times.

_8j504y ago

You'd be surprised, people who expect python to be "smart" and "figure it out" might think that way.

groestl4y ago

Well, it's not completely unlogical.

'a' is str

b'a' is bytes

f'something' might be a separate f-str-type too?

1 is an int

1.2 is a float

(1.2 + 1) is a float

GeorgeTirebiter4y ago

I would use such a feature, as I always use f-strings when formatting.

4 more replies

HL33tibCe74y ago

As a beginner, maybe. But this is code in some of the biggest open-source Python repos in existence. Probably written by someone with a reasonable level of Python expertise.

genewitch4y ago

... and yet?

groestl4y ago

I think what's happening is that people assume there's type coercion going on here.

jldugger4y ago

I wonder if an autoformatter like black is at play here.

saila4y ago

Black doesn't split strings, and I doubt they'd choose to use concatenation if they did.

hathawsh4y ago

I kind of wish Black would split and join literal strings. I've seen several times when Black converted code like this:

    raise SomeError(
        "Explanation... "
        "yet more explanation"
    )

To this:

    raise SomeError(
        "Explanation... " "yet more explanation"
    )

That just looks odd. Black is fantastic, but not perfect.

d0mine4y ago

Black does split strings if configured and it does use implicit concatenation of Python string literals.

1 more reply

xigoi4y ago

Black doesn't change the semantics of code.

gus_massa4y ago

I don't mind it's a bot, and I really appreciate that apparently a human made the final review before sending the PR. But I don't like that the tittle of the commit is:

> Fix issue probably-meant-fstring found at https://codereview.doctor

I expect a more neutral title for a commit, something like

> Fix fstring in <name-of-file>

bvinc4y ago

So they checked 666 python repositories and fixed bugs in 69 of them. Interesting choice of numbers.

defterGoose4y ago

...doing the devil's work.

InfiniteRand4y ago

Fixing F-strings

racl1014y ago

Nice!

saagarjha4y ago

It’s a nice coincidence.

memco4y ago

Relatedly the logging optimization suggests setting: raiseExceptions to false for production logging code: where is that set? On the logger, handler or something else?

bobbiechen4y ago

I was also confused by the expression evaluation thing. Reading between the lines, it seems like

    logger.debug("hello %s", foo)

may be better than

    logger.debug(f"hello {foo}")

bbayles4y ago

The first is better also because you can do things with loggers other than print out their contents. For example, suppose you had a statement like:

    logger.debug('Database error: %s', error_message)

masklinn4y ago

That's exactly it.

jrootabega4y ago

You're also able to add additional log-specific processors to the log record in the first case.

aib4y ago

> where is the f-string described as suboptimal?

As for raiseExceptions, I'm not sure it's for optimization. It looks like an old sanity check for bad logging configurations.

nimish4y ago

The "Use logging's interpolation" warning has always annoyed me. If logging is in your hot path, that might be an issue. Me f-string interpolating some info logs that run once for convenience is not.

Low value junk like this is not helpful.

readthenotes14y ago

MauranKilom4y ago

I don't think this makes sense. Plain strings and format strings are not interchangeable, and using one where the other was meant is probably a bug.

kgeist4y ago

>Would you expect that a user input like "{secret} please" is interpolated?

That's basically what the recent log4j security vulnerability was all about. "Helpfully" interpolating logs by default.

boxed4y ago

Look up how this works in Swift. They only have one string. No raw strings or f strings. Yet they have all the power of all three python string types and less syntax. It's very nice.

minitech4y ago

Swift does have raw strings (the #"extended delimiter"# syntax).

1 more reply

asvitkine4y ago

If you only make it work with string literals (e.g. generate the underlying formatting logic at parse time), it wouldn't allow arbitrary inputs to be treated as f strings.

noobermin4y ago

Sohcahtoa824y ago

Adding that behavior would break existing code that uses str.format, and Python tries to avoid breaking code between minor releases.

HL33tibCe74y ago

That’s not really a feasible solution in Python because that change would break a load of existing code.

mschuster914y ago

So what? Raise a deprecation notice, treat it as a fatal error in two or three years and that's it. PHP has been doing this for years now.

swatcoder4y ago

It's worth being judicious about where breaking changes are applied, right?

1 more reply

HL33tibCe74y ago

capitalsigma4y ago

Just bump the major version number from 3 to 4, right? How long could that migration take?

noobermin4y ago

We've done this too many times and we've had enough pain, let's please proceed at a pace where we can worry about delivering our product and not updating formatted strings, thank you.

krisoft4y ago

> treat it as a fatal error

bornfreddy4y ago

Also see: Python 2 => 3 hell. Nobody wants to repeat that.

1 more reply

Too4y ago

Changing something as deeply rooted as the string type?

rglullis4y ago

PHP has also been responsible for the majority of exploited servers and misconfigured applications. Whatever they are doing it, I take it as a strong negative signal.

1 more reply

dekhn4y ago

Python does not do this. A change like that would require a major version number increment and the community would revolt.

Too bad we can't go back in time to 1996 or so.

swatcoder4y ago

To be fair, your suggestion might make for a more resilient default, but it's also a great way to leak data and add overhead for the default case. There are tradeoffs.

Someone4y ago

Not much overhead, I would think. We’re talking about literal strings in source code, not strings in general. It’s not much work to check those.

I don’t think that’s an issue, though.

JadeNB4y ago

macspoofing4y ago

>...and suggests that automating error finding is an appropriate course instead of making it less likely to make the error in the first place.

Too4y ago

This is how bash works. Any string with a $ in it will be interpolated unless you double escape it. Also depending on if you use double or single quoted strings.

See also the yaml Norway problem for what happens when implicit goes over explicit.

boxed4y ago

This is how strings work in swift. It's a much superior system imo.

rikateeOP4y ago

what's also ironic is I left an easter egg in the code sample for how we downloaded the list of repositories and no one has noticed it yet.

dewey4y ago

> For science you can see the reactions here.

That link seems to be broken: https://github.com/issues?q=is%3Aissue+author%3Acode-review-...

I was actually surprised to read that people would ignore or be annoyed by a bot raising a valid PR that can be easily merged after a quick glance. What would be the reason for that?

TrickardRixx4y ago

[0] https://github.com/PyCQA/pylint/issues/5039

wbobeirne4y ago

> Autogenerating PRs like this

The article specifically mentions that they were not auto-generated,

2 more replies

zamadatix4y ago

Looking at 1 bot it doesn't sound bad, when you have everyones bot doing this kind of stuff it can quickly become more of a nuisance than a help.

jjoonathan4y ago

    * Small benefit to bot creator
    * Tiny benefit to project
    * Modest cost to maintainer

Waves of low-effort resume-padding commits are already a thing. Not a big problem, but bots clearly have the potential to multiply the small problem into a big problem.

cinntaile4y ago

https://github.com/Qiskit/qiskit-terra/pull/7982

That guy was not happy. I do agree that it's basically advertising and that's annoying.

llbeansandrice4y ago

I would never tolerate ads in my commit history. That's ridiculous.

It's basically using open source repos as an advertising platform for their static-analysis bot.

mhils4y ago

In our case OPs bot did not open a PR which could have been merged quickly, but filed an issue instead.

Forge364y ago

What I've found from doing similar types of changes.

1. It's hard to explain the impact to the application of the current problem. Thus it looks like a theoretical issue

2. Sometimes people rely on the bug for their code to work

3. Surprise work can be poorly received (ie: not the current priority)

brandonbloom4y ago

In addition to what others have already said, my own random sampling now shows quite a high false positive rate.

Waterluvian4y ago

I expect to see the entire gamut of possible reactions with a sufficient number of bot PRs. But in looking at 10 of them at random, I didn't find a single "negative response."

vitus4y ago

Some negative reactions:

https://github.com/mitmproxy/mitmproxy/issues/5285

https://github.com/Qiskit/qiskit-terra/issues/7981

https://github.com/beetbox/beets/issues/4340

I do think those concerns are legitimate. (I also think more tooling is a good thing!)

Waterluvian4y ago

Thank you for sharing these links!

dekhn4y ago

1 more reply

VWWHFSfQ4y ago

Because this is basically just PR spam

dekhn4y ago

to me, well-intentioned systems wiht a high true positive rate and low false positive rate are welcome so long as they follow reasonable etiquette and norms, which this group seems to do.

mjs72314y ago

This was posted on Reddit earlier this week with similar negative responses: https://www.reddit.com/r/Python/comments/ubkvrd/10_of_the_66...

rikateeOP4y ago

FWIW, HN is much more positive (while also raising valid points that will be taken into account going forward)

snapetom4y ago

I'd like to add better technical discussion, too.

f7fg_u-_h4y ago

> After creating 69 pull requests the reaction ranged from:

> Neutral silence. They just merged the PR.

> Gratitude. “thanks” or “good bot”.

I appreciate their self awareness about responses from maintainers.

malcolmgreaves4y ago

You can also use flake8 to find this, and even more, errors in Python code.

rikateeOP4y ago

flake8 does not currently support this check, as they are concerned about the false positives from "what if the string it later used in .format(...)"

However, Code Review Doctor is more of a "this MIGHT be a problem. have you considered..." rather than "it wrong"

fareesh4y ago

noobermin4y ago

The f-strings are a recent (may be not so recent now) addition to the language, so all the errors stem from it being "new" where people's reflexes / carefulness hasn't adjusted to them yet.

Too4y ago

Compare to Typescript having a very small but significant difference. They use backticks instead of an f-prefix.

Linting the existing f-strings is, as shown by this bot, unfortunately very difficult.

nomel4y ago

But, perhaps you're right, and the total number of bugs would be reduced with f-strings, but that would require making them default back in python 1.0.

polio4y ago

pabs34y ago

Its a shame Code Review Doctor isn't open source, then everyone could install it and use it on any code they write.

zikohh4y ago

Can I use code review doctor with gitlab? If not what options do I have?

ggm4y ago

Maybe catenation of an fString and a string should yield an fString by type promotion? String is morally "any" so it feels to me like a contextual narrowing of type.

carl_dr4y ago

What if you wanted to append a string with braces, could you? (I don’t know Python!)

Plus it would lead to subtle security vulns and other bugs. A contrived example :

f”{bot} spammed my repo saying” + “touch(‘{path}’) was wrong lol.”

Now my path var has been disclosed.

bikingbismuth4y ago

If you want braces in your f strings you double them up.

Ex:

n = "hn"

print(f"hello {n} your path is {{path}}")

Out: hello hn your path is {path}

SnowflakeOnIce4y ago

Strings and f-strings are not separate types in Python, sadly.

baisq4y ago

The comments on this submission are overwhelmingly negative. Why are the comments on PVS-Studio submissions, on the other hand, generally positive?

safwan4y ago

f-string does not work with GNU/gettext! It is also a common mistake that people make!

j / k navigate · click thread line to collapse