5% of 666 Python repos had comma typo bugs (inc V8, TensorFlow and PyTorch) (opens in new tab)

(codereviewdoctor.medium.com)

360 pointsrikatee4y ago327 comments

327 comments

153 comments · 36 top-level

usrbinbash4y ago· 20 in thread

Literally the second item in the "Zen of Python" (https://www.python.org/dev/peps/pep-0020/):

Explicit is better than implicit.

And yet, s = ["one", "two" "three"] will implicitly and silently do something, that is probably wrong most of the time.

jstx14y ago

I mean the zen being wrong is kind of a meme at this point. The whole “only one obvious way to do it” isn’t just false but the exact opposite is true. Python is one of the most flexible languages with many many ways to do the same thing; more than any other language I can think of.

solox34y ago

Notice that, in the original quote,

    There should be one-- and preferably only one --obvious way to do it.

the author used two different ways of hyphenating (three, if you count the whole PEP 20). PEP 20 is clearly not meant to be taken as law. Nor PEP 8. Nor PEP 257.

People frequently mistake "one obvious way" with "one way". There are lots of ways to iterate through something, for example, but there is really one obvious way. And the philosophy here still applies: when you read anyone else's python code, the obvious way is probably doing the obvious thing. I think that is the more appropriate takeaway from PEP 20.

6 more replies

webmaven4y ago

> I mean the zen being wrong is kind of a meme at this point. The whole “only one obvious way to do it” isn’t just false but the exact opposite is true. Python is one of the most flexible languages with many many ways to do the same thing; more than any other language I can think of.

Not in comparison to Perl, which usually has multiple ways to do anything, each 'obvious' to different sets of people (each Perl codebase therefore seems to have a distinct dialect based on which 'obvious' alternatives are chosen).

The other direction languages can take that is being contrasted, is there being one non-obvious way to do something.

Python's 'most obvious way' isn't necessarily the fastest/most concise/most efficient/scalable/etc. way to do something in Python, but it will usually be obvious to most Python developers. And although broad styles have certainly developed over time (imperative, functional, OO) as Python has gained power and flexibility, the dictum still largely holds true.

1 more reply

pmarreck4y ago

Except exit.

I knew Python wasn't for me in my first foray into it when I fired its REPL and then went to exit it with control-C or whatever and it literally printed out the right way to do it but then didn't do it. Python was more interested in having me do things a certain way even when it knew what I intended to do, just to be a twit.

2 more replies

oblvious-earth4y ago

It was a meme when Zen was written, the spaces around the em dash are handled 3 different ways. Twice in the line you abbreviated, removing the joke.

lenkite4y ago

Python finally ended up following Perl's TMTOWTDI motto! https://en.wikipedia.org/wiki/There%27s_more_than_one_way_to...

fault14y ago

the zen of python was written in the 90s.

from that context it makes sense, because the only goal of python in the 1990s was to be more popular than perl, which was notorious in having many ways of doing the same thing.

but yeah, python had had significant feature creep over the years, it's nowhere near the small clear lang it used to be.

2 more replies

savant_penguin4y ago

Matplotlib is an example of a library with at least two "correct" ways of plotting

2 more replies

Quekid54y ago

It's sort of like the Unix Philosophy. It sounds good and is probably a good thing to strive for generally, but it's ultimately pointless when it comes to actually evaluating whether approach A is better than approach B.

egeozcan4y ago

> Complex is better than complicated

What? Something being complex is artificial, we try to avoid it. Problems can be complicated, we try to simplify them, and more complicated the problem is, we tend to develop more complex solutions. So comparing them does not make sense?

Or did I always know them wrong?

3 more replies

dpedu4y ago

Hmmm, it sounds like you're expecting "two" and "three" to be separate list elements because of some sort of implicit behavior due to being written in a list context. This is the opposite of what "Explicit is better than implicit" means.

This is a list and you must explicitly place a comma when you want to start a new element in the list. Is there ever a time a new element follows a previous one and is NOT separated by a comma? No, this is explicit.

Whereas, strings also always concatenate in this manner be it in a list context or not. It seems like you're assuming behaviors from other languages would be the same in another.

matsemann4y ago

No, we don't want it to implicitly be a list item. We want it to fail as invalid syntax. If I wanted the two and three strings to be combined, I would have /explicitly/ used an operator for that. It's the implicit behavior of that which is the problem.

1 more reply

aylmao4y ago

Ah yes, why would anyone expect lists' main purpose to be listing?

Sarcasm aside, I'd assume people primarily list things in between [ and ], and sometimes concatenate things in there too. The language should err on the side of doing what people expect, unless explicitly told not to.

> It seems like you're assuming behaviors from other languages would be the same in another.

Rather, I think people expect a language, especially one this big and important, to work for them, and not to be designed with unergonomic features instead.

ReleaseCandidat4y ago

> it sounds like you're expecting "two" and "three" to be separate list elements

I'd expect that to be an error.

1 more reply

bokchoi4y ago

I'm not a python programmer, but the implicit string concatenation seems surprising to me.

2 more replies

doubleunplussed4y ago

Your sarcasm is misplaced. I would prefer a SyntaxError to either of the implicit behaviours.

twobitshifter4y ago

I could see lisp programmers missing the commas out of muscle memory

kazinator4y ago

> Is there ever a time a new element follows a previous one and is NOT separated by a comma?

Yes:

  [ "one, two", "three" ]

The comma is not an absolute context-free indicator of element separation.

rat99884y ago

This is not what implicit is about.

ianbicking4y ago

Implicit concatenation sure seems implicit to me

1 more reply

kazinator4y ago· 16 in thread

Not in Lisp! ("foo" "bar") and ("foobar") are lists of length 2 and 1, respectively.

(Python copies some bad ideas from C. Another one is having to import everything you use. It seems that since Python is written in C, its designer took it for granted that there will be something analogous to #include for using libraries, even standard ones that come with the language.)

Implicit string literal catenation is tempting to implement because it solves problems like:

   printf("long %s string"
          "nicely breaks up"
          "with indentation and all",
          arg, arg, ...)

and if you're working in a language which has comma separation everywhere, you can get away with it easily.

There are other ways to solve it. In TXR Lisp, I allow string literals to go across multiple lines with a backslash newline sequence. All contiguous unescaped whitespace adjacent to the backslash is eaten:

  This is the TXR Lisp interactive listener of TXR 273.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  TXR needs money, so even abnormal exits now go through the gift shop.
  1> "abcd \
      efg"
  "abcdefg"

If you want a significant space, you can backslash escape it; the exact placement is up to you:

  2> "abcd\ \
      efg"
  "abcd efg"
  3> "abcd    \
     \ efg"
  "abcd efg"
  4> "abcd    \ \
               efg"
  "abcd     efg"
  5> "abcd    \ \
     \         efg"
  "abcd              efg"

rileymat24y ago

I like imports, it tells me what files symbols are coming from, even for built in libraries.

Maybe it is that through my work I use a half dozen languages, where it is hard to remember each in detail.

I have also worked on a javascript project where there were no imports/requires and the build process created one file. So you had to inspect the confusing build script to even know what was what.

stevesimmons4y ago

I like the explicit nature of Python's imports.

And especially how I can choose the best way to indicate the sources of names in my code:

   import time
   t = time.perf_counter()

   import time, my_module
   t1 = time.perf_counter()
   t2 = my_module.perf_counter()

   from time import perf_counter as std_counter
   from my_module import perf_counter as my_counter
   t1 = std_counter()
   t2 = my_counter()

   try:
       from my_module import perf_counter
   except ImportError:
       # Fall back to standard implementation
       from time import perf_counter
   t = perf_counter()

   # import time as m  
   import my_module as m
   t = m.perf_counter()

1 more reply

kazinator4y ago

You could fairly easily work with a bunch of .js files that get catenated together by using an editor that can jump to a definition.

Build processes creating one file is the seven decade norm in computing.

Even if you literally don't catenate the .js files into one, they get loaded into one running image one way or another.

Someone4y ago

You mean

  long %s stringnicely breaks upwith indentation and all"

? In my experience, this always gets ugly when you want to insert spaces (= about always). Do you put them at the end or at the start of each string (apart from the first or last string)

I think scala’s mkString (https://superruzafa.github.io/visual-scala-reference/mkStrin...) is the best solution, visually, for such things, but unfortunately, it would require hackers in the parser to do the concatenation at compile time, where possible.

Scala’s multiline strings look nice, too, if you want to insert newlines, except for the stripMargin thing (https://docs.scala-lang.org/overviews/scala-book/two-notes-a...)

kazinator4y ago

The spaces aren't the point of the comment; rather that we can break the literal into pieces and indent those pieces without affecting the contents. In a non-strawman real exmaple with real data, of course we include all the necessary spaces in the literals. However, this bug is easy to make in C; I've seen it numerous times.

1 more reply

NoahTheDuke4y ago

> Another one is having to import everything you use.

The alternative is what exactly? Have the entire standard library exposed at once? Make all modules create non-conflicting names for exported objects, so that the json parse function has to be called json_parse and the csv parse function has to be called csv_parse?

Seems less than ideal to me.

kazinator4y ago

That's one way.

If these things are classes in a plain old single-dispatch oop system, you can havec a json-parser and csv-parser which have parse methods.

There could be packages/namespaces. So csv:parse and json:parse. These packages are standard and so they just exist; nothing to import.

In Python, you cannot use anything without an import! The top-level modules (which serve as de facto namespaces) themselves are not visible.

Say there is a csv module with a parse. You cannot just do:

  csv.parse(...)

you have to first say

  import csv

This jaw-droppingly moronic.

3 more replies

justsomehnguy4y ago

   @"
  here strings in PS are fine for this purpose and 
   even allows whitespace anywhere            
    but because of the latter you can't indent it    
     with your other code   
 "@ -split "`r`n" | % {'<SOL>{0}<EOL>' -f $_ }
 <SOL>    here strings in PS are fine for this purpose and <EOL>
 <SOL>     even allows whitespace anywhere            <EOL>
 <SOL>      but because of the latter you can't indent it    <EOL>
 <SOL>       with your other code   <EOL>

kazinator4y ago

I posted a Unix StackExchange answer with some tricks for doing this in shell programming, very similar to your <SOL> trick.

https://unix.stackexchange.com/questions/76481/cant-indent-h...

lanstin4y ago

Having everything be imported is what makes the language be useable. Especially if you never import * you can easily find the definition and meaning of everything you read on the screen. A prime example of explicit is better than implicit.

And backslash doesn’t let you have the literal obey the proper indenting. Might as well use “””

kazinator4y ago

> you can easily find the definition and meaning of everything you read on the screen

I don't want to be finding definitions of things that the language provides in the code.

Languages that don't work this way have IDE's, editor plug-ins or other tools for easily finding the definitions of things that are in the language, without hunting for them through intermediate definition steps in the same file.

"I've spent all my life in and out of jails, so I expect bars on doors and windows ..."

Spivak4y ago

I'm gonna disagree on the import thing. Compared to Ruby where requires are magic bags of metaprogramming bullshit, Python is much much easier to reason about. It takes some getting used to that require 'json' actually adds methods to existing classes.

kazinator4y ago

"require 'json'" is just another #include in disguise, and if it monkey patches existing classes, it ... probably should not exist in any form.

If the language supports json, it should just do that.

  1> #J[1,2,3]
  #(1.0 2.0 3.0)
  2> (get-json "[1,2,3,{\"foo\":true}]")
  #(1.0 2.0 3.0 #H(() ("foo" t)))
  3> (put-json #(1.0 2.0 t))
  [1,2,true]t

1 more reply

tgv4y ago

The difference is: in C, it's pretty unlikely someone wants to add strings. I suppose it's even illegal in the later C versions.

kazinator4y ago

It is positively not illegal in any standard verision of C since ANSI C 89.

It's an essential feature used in all sorts of everyday code.

C99 added printf conversion specifiers that are hidden behind macros, and idomatic usage of them relies on string catenation.

  uint32_t x = 0;

  printf("x = " PRIx32 "\n", x);

where PRIx32 might expand to "%lx" (if uint32_t is the same as unsigned long in that compiler).

All sorts of C macrology relies on string catenation. Kernel print messages:

  printk(KERN_EMERG "%s: temperature sensor indicates fire!", dev->name);
                   ^ must not have comma here

2 more replies

edflsafoiewq4y ago

The Python certainly looks nicer though.

1 more reply

oaiey4y ago· 15 in thread

I am a bit in shock. Accidental string concatenation. Python just lost a lot of reputation in my brain.

ErikCorry4y ago

Misspelling a variable on the lhs of an assignment just causes a new variable to be created with the new name. That's a lot worse in my book.

version_five4y ago

I dont think that's the same kind of thing. Your example is a tradeoff that anyone who uses a language that doesn't require explicit variable declaration faces, and it's pretty tough to argue such languages really shouldn't exist.

Missing an operator resulting in explicit behavior is much more subtle and not even obvious behavior. For those who use python, it is worse.

3 more replies

ReleaseCandidat4y ago

I'd say unexpected behavior is always worse than expected one.

Yes, you'll certainly find somebody who doesn't know what 'not statically typed' means, but ... And yes, there are also C(++) users, that expect strings to be concatenated like that.

1 more reply

oaiey4y ago

While I agree, this is somehow something I expect. Implicit string concatenation without operator or function around it sounds just like a terrible idea. It breaks the basic syntax concept of `foo X bar`. On the other hand it is probably very handy with DSLs and things like that.

1 more reply

voltagedivider4y ago

Isn't that common for all/most languages that don't require explicit typing?

5 more replies

jstx14y ago

That’s a complaint against the entire type system, nothing to do with misspelling.

1 more reply

colpabar4y ago

I was going to comment something like "who would even use this?" and then I remembered that I have in fact used that feature :) It's a somewhat "nice" way to write long strings and keep the code from getting too wide. I never did it inside an array, but I found breaking up a long string into smaller ones and wrapping them in parens without a comma was convenient, for things like error messages.

But that's just what comes with a hyper flexible language like python. You can do lots of things in lots of different ways, but you can also screw things up just as easily, and your IDE won't tell you because technically it's valid code.

oaiey4y ago

I completely get that. That is a very nice feature for building DSL or libraries with special needs. But it makes the overall language very dangerous.

Is this "operator" overloadable on each type in Python?

And that scares me a lot. I think I have to reevaluate my position towards Python.

1 more reply

silisili4y ago

Why not just use plusses? Or perhaps a join func, which would accomplish the same.

I get the use case as you described it, but it just seems like minimal effort to accomplish and have some semblance of explicit/safety.

1 more reply

BeetleB4y ago

Heh. I use it all the time the way you do and didn't realize this is alien to many developers (no one in my team every complained about it).

It's common in some languages and used the way you use it. I looked in PEP8 and it seems they don't discuss this.

I think it's a perfectly valid use case, but clearly there are two camps to this. If this is so contentious, I would recommend PEP8 be revised to either explicitly endorse it as a way to split long lines or to explicitly discourage it and recommend the + operator instead.

shultays4y ago

You could have the same behavior by enforcing + operation in between

  mylongstring = "hello" +
    "world"

No idea if python's way of indentations allows this but sounds like it should

2 more replies

atleta4y ago

Not sure if it's irony or not. After all, this is not really accidental string concatenation but an easy to make type error which can go undetected due to the dynamic typing (and the lack of thorough type annotation in most code).

The string concatenation in itself should not be a problem as it's really just string constants. (But again, it might be irony exactly because of this :) )

oaiey4y ago

Unfortunately no irony.

I come from a programming platform (C#) where productivity is a key element of language design. I highly doubt that Anders Heijlsberg would have accepted such a error prone concept like a literal free implicit operator on a key type like strings.

1 more reply

ErikCorry4y ago

In most languages an array with 3 elements has the same type as an array with 2 elements so the type system isn't going to warn you about the difference between

("foo" "bar", "baz")

and

("foo", "bar", "baz")

2 more replies

ehsankia4y ago

C/C++ has the exact same thing, no?

aeturnum4y ago· 11 in thread

The high-level goals of python end up creating these little syntactic landmines that can get even experienced coders. My personal nomination for the worst one of these is that having a comma after a single value often (depending on the surrounding syntax) creates a tuple. It's easy to miss and creates maddening errors where nothing works how you expect.

I've moved away from working in Python in general, but I think the #1 feature I want in the core of the language is the ability to make violating type hints an exception[1]. The core team has been slowly integrating type information, but it feels like they have really struggled to articulate a vision about what type information is "for" in the core ecosystem. I think a little more opinion from them would go a long way to ecosystem health.

[1] I know there are libraries that do this, I am not seeking recommendations.

ehsankia4y ago

A lot of people in this thread are using this to make fun of Python, but the exact same issue exists in something like c++, here's some I fixed recently:

https://github.com/UWQuickstep/quickstep/pull/9

https://github.com/tensorflow/tensorflow/pull/51578

https://github.com/mono/mono/pull/21197

https://github.com/llvm/llvm-project/pull/335

aeturnum4y ago

I didn't understand anyone to be saying that Python is the only language to have this flaw.

Also, I personally don't mind this approach to string concatenation. I think it's a fine compromise between easy formatting and clarity. I was whining about a corner case of tuple construction - which as far as I know is not a feature of any other language.

1 more reply

rbonvall4y ago

I think automatic string concatenation and singleton tuples were not introduced according to some high-level goal. They are just historical baggage. Automatic string concatenation comes from C, and the singleton tuple syntax probably just seemed like a good idea at first.

In hindsight, singleton tuples are not common or useful enough to deserve their own syntax. If the way to create them was something like this:

    t = tuple.single("hello")

we'd thing it's ugly or inconsistent, but definitely not confusing or bug-prone.

hddqsb4y ago

One place where singleton tuples used to be common is with the old "%"-formatting, specifically in the case where there is a single argument and its value might be a tuple:

    x = (1,2,3)
    #print("the value of x is %s" % x)   # breaks if x is a tuple
    print("the value of x is %s" % (x,)) # works even if x is a tuple

There is a readable way to create singleton tuples, without the sneaky trailing comma or a new function like tuple.single:

    tuple(["hello"])

The square brackets can be slightly annoying. I recall writing the following function to omit them:

    def tup(*args):
        return tuple(args)

This basically lets you use the usual tuple syntax, just prefixed with the word "tup". The advantages are that you don't need a trailing comma for singleton tuples, and it's more obvious that a tuple is being created (it can be difficult to distinguish between tuple literals and parentheses used for grouping in a complex expression).

I am reminded of a somewhat similar issue with empty set literals: {1,2} is a set, {1} is a set, but {} is a dict. The way to create empty sets is using set().

1 more reply

macNchz4y ago

I’ve been writing Python professional full time for 8 years and still occasionally make the trailing-comma-tuple mistake. These days at least I’ll recognize and be able to find it quickly rather than wasting time. Can be caught with a linter, but not every codebase is readily linted.

aylmao4y ago

The lack of a static type-system is IMO what makes these one-character mistakes very annoying. The compiler can't tell you something is wrong, so you're just left to figure out why things are broken, just to realize it was the smallest of typos.

aeturnum4y ago

I love how simple and forgiving Python is for small projects. The "trailing comma creates a tuple" situation comes out of, as far as I can tell, a desire to create maximally convenient syntax in the scenarios where tuples are intended. I think that's great for small code!

I just wish that the core team would take that same zeal for a "pythonic" experience with small code and use it to develop more scaled-up systems for dealing with larger code bases. My idea is to enforce strong pre-conditions on function calls using type hints, but I am sure there are other ways to do it.

3 more replies

tyingq4y ago

C lets me do this, and doesn't say much about it.

  char ch_arr[3][10] = {
      "uno",
      "dos" 
      "tres"
  };

3 more replies

anyfoo4y ago

Fully agreed. If python had a proper static type system, those typos would hardly matter, and you'd have the best of both worlds: Convenient, concise syntax, but still confidence in your code.

I say "had a proper type system", but actually it turns out that it does have something like that: When I use python for anything else than a most tiny script now, I use "mypy"[1] which implements static typing according to some existing Python standard (whether that came about because of mypy or the other way around, I don't know).

It is so, so good to have mypy telling me where I messed up my code instead of receiving a cryptic, weird runtime error, or worse, no error and erratic runtime behavior. Because not knowing that a particular type is unexpected and wrong, values often get passed along and even manipulated until the resulting failure is not very indicative of the actual problem anymore.

[1] http://mypy-lang.org

1 more reply

luhn4y ago

I feel like it's been pretty clear from day one that type hints are meant for static analysis with tools like mypy. It's not exclusive to that use and has a lot of other possible applications, but the primary goal has always static analysis.

hsbauauvhabzb4y ago

I’d rather a compile time error over an exception (or both), which in many cases can occur. I know mypy does this, maybe I should alias python=“mypy&&python”

karolkozub4y ago· 8 in thread

I really like the idea of automated code review tools that point out unusual or suspicious solutions and code patterns. Kind of like an advanced linter that looks deeper into the code structure. With emerging AI tools like Github Copilot, it seems like the inevitable future. Programming is very pattern-oriented and even though these kinds of tools might not necessarily be able to point out architectural flaws in a codebase, there might be lots of low-hanging fruits in this area and opportunities to add automated value.

lumost4y ago

Consider that you may be describing a compiler. Typos are not generally a problem in statically typed languages with notable exceptions such as dictionary key lookups etc.

Even without static typing, argument length verification etc. can be done with a suitable compiler. In python we are left chasing 100% code coverage in unit tests as it's the only way to be certain that the code doesn't include a silly mistake.

samhw4y ago

I think 100% code coverage is folly. Spreading tests so widely near-inevitably means they're also going to be thin. In any codebase I'm working on, I would focus my attention on testing functions which are either (a) crucially important or (b) significantly complex (and I mean real complexity, not just the cyclomatic complexity of the control flow inside the function itself).

1 more reply

joatmon-snoo4y ago

I actually recently joined a startup working on this problem!

One of our products is a universal linter, which wraps the standard open-source tools available for different ecosystems, simplifies the setup/installation process for all of them, and a bunch of other usability things (suppressing existing issues so that you can introduce new linters with minimal pain, CI integration, and more): you can read more about it at http://trunk.io/products/check or try out the VSCode extension[0] :)

[0] https://marketplace.visualstudio.com/items?itemName=Trunk.io

rikateeOP4y ago

cool product :) it is just linting or do any of the tools do code transformation to offer the fix for the lint failure? (code review doctor also offers the fix if you add the github PR integration)

1 more reply

atleta4y ago

This is basically linting, i.e. code analysis. The techniques used might be more current (as they have been evolving, as you say, for pattern matching) but linting is just that: a code review tool to find usual bugs. (This is what did happen in this blog post. It wasn't looking for unusual solutions but usual mistakes.) The packaging, form of the feedback seems also different and that in itself may make a lot of difference in ease of use and thus adoption.

joatmon-snoo4y ago

Admittedly, the difference here is that codereview.doctor spent time tuning a custom lint on a variety of repos. In an org with a sufficiently large monorepo (or enough repos, but I don't really know how the tooling scales there) it's possible to justify spending time doing that, but for most companies it's one of those "one day we'll get around to it" issues.

rikateeOP4y ago

yeah something like sonarqube or https://codereview.doctor (if you use GitHub)

rak15074y ago

Or people could just write it correctly in the first place! Controversial I know! Seems like people would rather half-ass things and then let some AI autocorrect fix it up for whatever reason rather than doing it properly.

micimize4y ago· 7 in thread

For those looking to avoid this specific problem, there is a flake8 rule: https://pypi.org/project/flake8-no-implicit-concat.

More broadly, the https://codereview.doctors makers are making the point that their tool caught an easy-to-miss issue that most wouldn't think to add a rule for. A bit of an open question to me how many of those there really are at the language level, but still seems like a neat project.

oblvious-earth4y ago

Also all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.

Also in terms of mistakes codereviewdoctor twice linked to the same issue in their blog https://github.com/tensorflow/tensorflow/issues/53636 and raised the PR to the wrong project https://github.com/tensorflow/tensorflow/pull/53637 (I guess Tensorflow vendors Keras, easy mistake)

thrdbndndn4y ago

https://github.com/tensorflow/tensorflow/tree/0d8705c82c64df...

    STOP!
    This folder contains the legacy Keras code which is stale and about to be deleted. The current Keras code lives in github/keras-team/keras.

    Please do not use the code from this folder.

Yeah, not the most obvious notice.

The fact they didn't find the same mistake(s) in keras-team/keras (I assume they scanned, it's one of the most popular Python repo) makes me believe these issues have been fixed/removed in up-to-date karas repo.

1 more reply

sundarurfriend4y ago

> all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.

Also a factor that bugs in functional code are more visible, both during development and to users once shipped. So there may have been an equal number or more such bugs in the non-test code, that just didn't remain in the code base for this long.

pfisherman4y ago

Ime, Black will add parenthesis to clearly and explicitly indicate a tuple where there is trailing comma. Figured this out when I made the trailing comma mistake and wondered why Black kept reformatting my code.

trulyme4y ago

Black rules. I love it that I don't need to have a discussion about style with anyone when Black is used on the project.

1 more reply

tedmiston4y ago

The URL in this comment has an incorrect TLD: it should be `doctor` (singular).

https://codereview.doctor/

rikateeOP4y ago

there is also https://pypi.org/project/flake8-tuple/

typo in the url (or in HN's markup) btw: it's https://codereview.doctor

routerl4y ago· 6 in thread

tl;dr: Python concatenates space separated strings, so ['foo' 'bar'] becomes ['foobar'], leading to silent bugs due to typos.

I've been bitten by this one at work, and can't help but think it is an insane behaviour, given that ['foo' + 'bar'] explicitly concatenates the strings, and ['foo', 'bar'] is the much more common desired result.

edit: This also applies to un-separated strings, so ['foo''bar'] also becomes ['foobar']

Palomides4y ago

I assume it's based on the C behavior, where it can be handy with macros

I don't think it fits well in python

pmontra4y ago

Maybe. We must remember that Python was designed at the very end of the 80s so what was normal for developers back then could be unexpected nowadays. An example: the self in Python's OO is a C pointer to struct of data and function pointers. It should be perfectly clear to anybody writing OO code in plain C at the time (rising hand.) Five years later new OO languages (Java, Ruby) kept self inside the classes but hide it in method definitions.

1 more reply

pletnes4y ago

I assumed it was borrowed from shell, where everything can just be put next to eachother since it’s all text.

idealmedtech4y ago

It's a holdover from C, where implicit string literal concatenation is very useful in the preprocessor.

thrdbndndn4y ago

I luckily never accidently used this space-concatenation thing, but I've been bitten by the fact a=(1) doesn't create 1-element tuple multiple times in my early days learning Python.

onphonenow4y ago

I still don't understand why it doesn't! So I still get bit from time to time.

2 more replies

prepend4y ago· 6 in thread

This seems like not a big deal. It’s a common mistake and is in 5% of repos but it’s not causing major damage.

And there’s no evaluation of importance as to whether these instances are in test files or non-critical code. Packages are big and can have hundreds or thousands of files.

It could be that if these mattered, they would have been detected and fixed.

A good example for unit tests and perhaps checking to see if these bugs are covered or not covered.

I like these kinds of analyses but don’t like the presented like it’s some significant failure.

jollybean4y ago

5% of 'released' software is quite a lot, more importantly it's a class of errors that definitely should not exist. This is a 'bug' in the language effectively there just isn't any real upside.

Python has a few of these things, which is really sad.

bcrl4y ago

It's a class of error that would be caught by even the most basic testing. A better title for the article is that 5% of 666 Python repos have typos that demonstrate the code in them that is completely untested. It doesn't matter which language it is: untested code is untested code in any language.

4 more replies

jve4y ago

I checked those those 11 links to issues for major software. 10 bugs were actually in tests...

3 more replies

onphonenow4y ago

There were proposals to fix some of these but the unicode zeal beat out some of the more boring (but I'd say as important) cleanups.

rikateeOP4y ago

yeah the impact varies. the sentry one seems pretty big: https://codereviewdoctor.medium.com/5-of-666-python-repos-ha...

test did not work but did not fail either, imagine being that dev maintaining the code that the test professes to cover. Imagine being the user relying on the feature that test was meant to check (if the feature under test actually broke).

enchiridion4y ago

I mean, if you’re ultimately going to combine the list into a string anyway it’s no big deal.

Along those lines. I wonder how many of these come from ad-hoc file path handling instead of using pathlib.

tyingq4y ago· 5 in thread

Seems expected, as linters can't be sure when it's not intentional. Like this request to pylint:

https://github.com/PyCQA/pylint/issues/1589

Is there usually enough context for a linter to make an educated guess?

mikepurvis4y ago

I would have thought it would be a no-brainer to just ban it and insist on an explicit + operator. I'm pretty surprised that issue was so flippantly closed.

thaumasiotes4y ago

> I would have thought it would be a no-brainer to just ban it and insist on an explicit + operator.

Maybe as a matter of linting. As a matter of language design, I think + for string concatenation is a big mistake; using different symbols for numeric addition and string concatenation is something Perl got right.

1 more reply

ReleaseCandidat4y ago

The PR has been merged (for lists and tuples and sets only).

https://github.com/PyCQA/pylint/pull/1655

rikateeOP4y ago

can do a good job at allowing long urls for example, but would be whack a mole trying to cater for "all" purposeful implicit string concatenations

chrismorgan4y ago

Splitting long URLs onto multiple lines because you have a hard line length limit is considerably more harmful than exceeding the length limit in such cases, because you break the URL up so that tooling (including language-unaware static analysers) can’t conveniently access it. (e.g. if you want to open the link, you can’t just copy it or click on it or whatever, but must first join the lines, removing the quotation marks.) Any tool that forcibly splits up such lines when there is no fundamental hard technical reason why it must is, I categorically state, a bad tool.

arusahni4y ago· 4 in thread

The removal of implicit string concatenation was proposed for Py3k[1], but was rejected.

[1] https://www.python.org/dev/peps/pep-3126/

wodenokoto4y ago

The rejection notice seems completely counter intuitive to me. How is adding a plus "harder" compared to removing a foot gun?

> This PEP is rejected. There wasn't enough support in favor, the feature to be removed isn't all that harmful, and there are some use cases that would become harder.

oa20224y ago

This change would break a lot of legacy code for no good reason

The most common way to split a string in lines is using this concatenation formula.

4 more replies

Wowfunhappy4y ago

Does Python support the concept of allowing code to opt in to new safety features? I can understand rejecting something like this for the sake of legacy compatibility (something Python has abandoned too readily in the past), but it seems like an option—or maybe even a default—might be nice.

I suppose this is also something you could catch with a linter?

cpeterso4y ago

Yes: import from __future__

https://docs.python.org/3/library/__future__.html

1 more reply

bilalq4y ago· 4 in thread

The whole "666" thing really threw me off. I thought it was some Python specific term or something at first glance. They open with a sentence that mentions "5% of the 666 Python open source GitHub repositories" as though there were only 666 total open source Python GH repos. Picking a number with other fun connotations or whatever to use as a sample is fine, but without setting that context, it was kind of distracting from their main content.

deathanatos4y ago

Did you figure out what the context is, and if you did, would you mind spelling it out for me? I still haven't figured out what correction to make to that sentence to get it to make sense.

rikateeOP4y ago

in a blog post about the evils of typos there was a typo! classic https://en.wikipedia.org/wiki/Muphry%27s_law ;)

1 more reply

bilalq4y ago

They ran their static analyzer over a sample of GH repos. They chose 666 as the number for their sample size. That's all.

dudeinjapan4y ago

It's further evidence that the Illuminati intentionally put these typo bugs there to destabilize the global order.

dannymi4y ago· 4 in thread

I can see the value of a lint (if there's a newline without a comma, warn), but concatenating strings by multiplication is the correct thing to do (since it's also used this way in mathematics of parsers).

Using the plus operator to concatenate strings is just weird.

Think of the usual algebraic properties these operators are supposed to have.

"+" always is supposed to be commutative--so "a"+"b" = "b"+"a", if those mean alternatives (they usually do mean that in mathematics), is just fine.

On the other hand, multiplication is often not commutative--also not here. "a" "b" != "b" "a".

So string concatenation should be the latter. And indeed that's how it's in regular expression mathematics for example.

int_19h4y ago

Juxtaposition is not multiplication in this context - you can't write (2 3), for example, it has to be (2 * 3).

Furthermore, Python already uses * for strings to indicate repetition: ("foo" * 2 == "foofoo").

String concatenation really just needs its own separate operator. & is an obvious candidate, if only it wasn't so commonly appropriated for bitwise AND - which is a very poor use of a single-char operator as it's not something that you need often, especially in a language like Python.

On the other hand, D uses binary ~ for concatenation. That has a neat mnemonic: it's a "rope" that "ties strings together".

housecarpenter4y ago

& is also used for set intersection in Python. I think + for string concatenation isn't too bad, really. It fits in with the fact that length(s + t) = length(s) + length(t), the same way we write A × B for Cartesian product (since |A × B| = |A| × |B|, even though this operation is neither commutative nor associative) or B^A for a function space (since |B^A| = |B|^|A|).

hayd4y ago

Which invertible commutative string operation would you choose for + ?

This might be nice from a math point of view, but I think users are going to be confused using "string"^3 for repetitions (instead of "string"*3). + and * make too much sense to the unwashed masses.

At any rate, explicit is better than implicit.

dragonwriter4y ago

There is no reason you couldn't use str * str for concatenation and str * integer (or even string * real) repetition.

Well, except if you wanted to support user classes that could duck type as both strings and numbers, which it would make awkward.

shoyer4y ago· 2 in thread

Most of the "bugs" caught here (including in TensorFlow and in my own project, Xarray) seems to actually be typos in the test suite. This is certainly a good catch (and yes, linters should check for this!), but seems a little oversold to me.

chillee4y ago

Same :P I'm actually responsible for one of these (https://github.com/pytorch/pytorch/issues/70607), but it's a typo in a list of tests to skip.

hvdijk4y ago

A typo in a list of tests to skip means tests are run that are not intended to be run. This can lead to unexpected failures, so in my opinion is not the same as the errors in test suites where tests run with other test data than intended but should still pass.

pmontra4y ago· 2 in thread

As a comparison, in Ruby

  puts "a" "b" == "ab" # true

and

  puts "a"
    "b" == "ab"

prints "a" with "b" == "ab" evaluated to false and discarded. This could create bugs as with Python. However

  ["a"
     "b"] == ["ab"]

is syntax error at the beginning of the second line. The parser expects a ] It would evaluate to true if it were on one line.

grey-area4y ago

In Ruby one too many commas can also cause problems:

# list

list = "a","b",

# function

def foobar

end

=> ["a", "b", :foobar]

int_19h4y ago

I actually prefer Python approach here in that within () [] {} newlines are simply whitespace with no special meaning - this allows for very flexible formatting of expressions which is still unambiguous.

The implicit concat of string literals is the culprit here. It really should require "+".

titzer4y ago· 2 in thread

Just to be clear, the V8 "bug" was in the test runner code and caused mis-parsing of command line options for testing for non-SSE hardware. Not exactly a critical bug.

jeffbee4y ago

The way the bug arrived in that test runner is interesting. It sneaked in mid-review. Possibly bugs added in the middles of code reviews are more likely to get through.

https://chromium-review.googlesource.com/c/v8/v8/+/2629465/3...

Personally, I prefer uniform lists with leading commas, because it's easier to add and remove lines for later, inevitable refactoring. For example, I prefer:

  things = [
    'foo'
  , 'bar'
  , 'baz'
  ]

This drives some people crazy, but I think it's the One True Way.

aflag4y ago

Isn't

  things = [
    'foo',
    'bar',
    'baz',
  ]

even better? In your case, if you want to add something to the beginning of the list you'll have to modify two lines.

1 more reply

LAC-Tech4y ago· 1 in thread

A lot of people are criticising dynamic typing for this.

It doesn't seem to have anything to do with typing discipline.

    words = (
        'yes',
        'correct',
        'affirmative'
        'agreed',
     )

Would be a tuple (immutable list) of strings, while

    words = (
        'yes',
        'correct',
        'affirmative',
        'agreed',
     )

would also be a tuple of strings.

If haskell had for some reason decided to have the same syntax sugar, it also would have caused an issue.

motles4y ago

You got me for a second there.

wartijn_4y ago· 1 in thread

I like this. It's clearly meant as marketing for their product, but imo the best kind of marketing. They don't just run their tool and automatically make tickets, but check for false positive and (offer to) make pr's.

It's both good for those projects and for the company that does the marketing since they reach there exact target group. Plus it gets them on the front page of HN.

ehsankia4y ago

A great addition to prune a ton of false-positives is to check the length of the strings. Almost always, the intentional implicit concats will have a very long string that reaches the max line length, whereas the accidental ones are almost always very short strings.

tus6664y ago· 1 in thread

Alternative title: 5% of Python repos has inadequate test coverage.

_dain_4y ago

Most of the errors were in the tests themselves.

Pensacola4y ago· 1 in thread

Why 666?

aflag4y ago

It's a biblical number. No deeper meaning.

xvilka4y ago· 1 in thread

Python was never supposed to be a language for anything more complex than basic scripting and prototyping. Use proper languages with static typing (and better speed) for anything serious. And no, JavaScript isn't a good language either.

hesdeadjim4y ago

Yea, but TDD! :eyeroll:

wirthjason4y ago

Ironic to see this today. I spent an hour debugging this very same issue this morning.

I was just doing some simple refactoring, changing a hard coded sting into a parameterized list of f-strings that’s filtered and joined back into a string.

I’m glad that I had unit tests that caught the problem! I couldn’t figure out why it was breaking, that comma is very devilish to spot with the naked eye. I’m surprised my linters didn’t catch it either. Maybe time to revisit them.

ehsankia4y ago

Nice! Internally we have a PCRE support on our code search and I regularly run a regex to find and fix these. I've also found a ton on opensource project which I've been trying to fix:

https://github.com/YosysHQ/prjtrellis/pull/176

https://github.com/UWQuickstep/quickstep/pull/9

https://github.com/tensorflow/tensorflow/pull/51578

https://github.com/mono/mono/pull/21197

https://github.com/llvm/llvm-project/pull/335

https://github.com/PyCQA/baron/pull/156

https://github.com/dagwieers/pygments/pull/1

https://github.com/zhuyifei1999/guppy3/pull/12

https://github.com/pyusb/pyusb/pull/277

https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull...

It is indeed a very common mistake in Python, and can be very hard to debug. It bit me once and wasted a whole day for me, so I've been finding/fixing them ever since trying to save others the same pain I went through.

EDIT: I will point out that I've found this error in other non-Python code too, such as c++ (see the 2nd PR for example).

Here's the regex for anyone curious:

[([{]\s*\n?(\s*['"](\w)+['"],\n)+(\s*['"]\w+['"]\n)(\s*['"]\w+['"],\n)*

pxeger14y ago

The first one, the implicit concatenation, I can see. But the rest of the things seem like most of the time they're intentional.

    {
        'key': (
            'long string long string long string'
        )
    }

Using parentheses like this to put long strings on their own line is standard practice.

    title = 'Hello world',

I, for one, have often used this deliberately.

the_gigi4y ago

I often use split().

Instead of:

  s = ['a', 'b', 'c']

I'll type:

  s = 'a b c'.split()

For multiline lists where I want to get rid of leading whitespace I'll add lstrip():

  lines = """line 1
             line 2
             line 3
  """.split('\n')
  lines = [line.lstrip() for line in lines]

anonymousiam4y ago

Heh. "cromulent" again.

..."there are perfectly cromulent reasons a developer would do implicit string concatenation spanning multiple lines"...

https://www.merriam-webster.com/words-at-play/what-does-crom...

Subsentient4y ago

Interesting. I've hit this bug before, but not often in Python as far as I can remember. I guess if I need a huge list of something, I'm more likely to look to a dict than use a list with normal indexes.

einpoklum4y ago

I wonder how many of those 666 have syntax bugs which are _difficult_ to locate using code analysis tools, because they are legit in themselves and you need to know what the author meant to make the call.

delgaudm4y ago

When I used to write code, especially SQL statements I would:

    "put"
  , "Commas"
  , "first"
  , "to"

avoid these kinds of things.

gumby4y ago

Clearly bugs by programmers who don’t adhere to the Oxford comma.

timzaman4y ago

Haha the bug in Tensorflow is in "tensorflow/tensorflow/python/keras/engine/training_generator_test.py". clickbait.

codeptualize4y ago

And then people make fun of JavaScript! (Just joking, I like Python, also JS, I guess everything has its quirks, it's a good thing we have linters)

ficklepickle4y ago

Ironically there are a variety of typos in the article.

A paragraph is repeated and the markdown links at the end are broken because there is a space between ] and (.

asow924y ago

I'm sure the devil is in the details on this bug.

dragonwriter4y ago

v8 may be a repo that includes some Python, but there is no reasonable standard by which it is a “Python repo”.

Forge364y ago

I wonder if any of the found issues will turn out to be important issues.

jiveturkey4y ago

nice ad!

j / k navigate · click thread line to collapse

327 comments

153 comments · 36 top-level

usrbinbash4y ago· 20 in thread

Literally the second item in the "Zen of Python" (https://www.python.org/dev/peps/pep-0020/):

Explicit is better than implicit.

And yet, s = ["one", "two" "three"] will implicitly and silently do something, that is probably wrong most of the time.

jstx14y ago

solox34y ago

Notice that, in the original quote,

    There should be one-- and preferably only one --obvious way to do it.

the author used two different ways of hyphenating (three, if you count the whole PEP 20). PEP 20 is clearly not meant to be taken as law. Nor PEP 8. Nor PEP 257.

6 more replies

webmaven4y ago

The other direction languages can take that is being contrasted, is there being one non-obvious way to do something.

1 more reply

pmarreck4y ago

Except exit.

2 more replies

oblvious-earth4y ago

It was a meme when Zen was written, the spaces around the em dash are handled 3 different ways. Twice in the line you abbreviated, removing the joke.

lenkite4y ago

Python finally ended up following Perl's TMTOWTDI motto! https://en.wikipedia.org/wiki/There%27s_more_than_one_way_to...

fault14y ago

the zen of python was written in the 90s.

from that context it makes sense, because the only goal of python in the 1990s was to be more popular than perl, which was notorious in having many ways of doing the same thing.

but yeah, python had had significant feature creep over the years, it's nowhere near the small clear lang it used to be.

2 more replies

savant_penguin4y ago

Matplotlib is an example of a library with at least two "correct" ways of plotting

2 more replies

Quekid54y ago

egeozcan4y ago

> Complex is better than complicated

Or did I always know them wrong?

3 more replies

dpedu4y ago

Whereas, strings also always concatenate in this manner be it in a list context or not. It seems like you're assuming behaviors from other languages would be the same in another.

matsemann4y ago

1 more reply

aylmao4y ago

Ah yes, why would anyone expect lists' main purpose to be listing?

> It seems like you're assuming behaviors from other languages would be the same in another.

Rather, I think people expect a language, especially one this big and important, to work for them, and not to be designed with unergonomic features instead.

ReleaseCandidat4y ago

> it sounds like you're expecting "two" and "three" to be separate list elements

I'd expect that to be an error.

1 more reply

bokchoi4y ago

I'm not a python programmer, but the implicit string concatenation seems surprising to me.

2 more replies

doubleunplussed4y ago

Your sarcasm is misplaced. I would prefer a SyntaxError to either of the implicit behaviours.

twobitshifter4y ago

I could see lisp programmers missing the commas out of muscle memory

kazinator4y ago

> Is there ever a time a new element follows a previous one and is NOT separated by a comma?

Yes:

  [ "one, two", "three" ]

The comma is not an absolute context-free indicator of element separation.

rat99884y ago

This is not what implicit is about.

ianbicking4y ago

Implicit concatenation sure seems implicit to me

1 more reply

kazinator4y ago· 16 in thread

Not in Lisp! ("foo" "bar") and ("foobar") are lists of length 2 and 1, respectively.

Implicit string literal catenation is tempting to implement because it solves problems like:

   printf("long %s string"
          "nicely breaks up"
          "with indentation and all",
          arg, arg, ...)

and if you're working in a language which has comma separation everywhere, you can get away with it easily.

  This is the TXR Lisp interactive listener of TXR 273.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  TXR needs money, so even abnormal exits now go through the gift shop.
  1> "abcd \
      efg"
  "abcdefg"

If you want a significant space, you can backslash escape it; the exact placement is up to you:

  2> "abcd\ \
      efg"
  "abcd efg"
  3> "abcd    \
     \ efg"
  "abcd efg"
  4> "abcd    \ \
               efg"
  "abcd     efg"
  5> "abcd    \ \
     \         efg"
  "abcd              efg"

rileymat24y ago

I like imports, it tells me what files symbols are coming from, even for built in libraries.

Maybe it is that through my work I use a half dozen languages, where it is hard to remember each in detail.

I have also worked on a javascript project where there were no imports/requires and the build process created one file. So you had to inspect the confusing build script to even know what was what.

stevesimmons4y ago

I like the explicit nature of Python's imports.

And especially how I can choose the best way to indicate the sources of names in my code:

   import time
   t = time.perf_counter()

   import time, my_module
   t1 = time.perf_counter()
   t2 = my_module.perf_counter()

   from time import perf_counter as std_counter
   from my_module import perf_counter as my_counter
   t1 = std_counter()
   t2 = my_counter()

   try:
       from my_module import perf_counter
   except ImportError:
       # Fall back to standard implementation
       from time import perf_counter
   t = perf_counter()

   # import time as m  
   import my_module as m
   t = m.perf_counter()

1 more reply

kazinator4y ago

You could fairly easily work with a bunch of .js files that get catenated together by using an editor that can jump to a definition.

Build processes creating one file is the seven decade norm in computing.

Even if you literally don't catenate the .js files into one, they get loaded into one running image one way or another.

Someone4y ago

You mean

  long %s stringnicely breaks upwith indentation and all"

? In my experience, this always gets ugly when you want to insert spaces (= about always). Do you put them at the end or at the start of each string (apart from the first or last string)

Scala’s multiline strings look nice, too, if you want to insert newlines, except for the stripMargin thing (https://docs.scala-lang.org/overviews/scala-book/two-notes-a...)

kazinator4y ago

1 more reply

NoahTheDuke4y ago

> Another one is having to import everything you use.

Seems less than ideal to me.

kazinator4y ago

That's one way.

If these things are classes in a plain old single-dispatch oop system, you can havec a json-parser and csv-parser which have parse methods.

There could be packages/namespaces. So csv:parse and json:parse. These packages are standard and so they just exist; nothing to import.

In Python, you cannot use anything without an import! The top-level modules (which serve as de facto namespaces) themselves are not visible.

Say there is a csv module with a parse. You cannot just do:

  csv.parse(...)

you have to first say

  import csv

This jaw-droppingly moronic.

3 more replies

justsomehnguy4y ago

   @"
  here strings in PS are fine for this purpose and 
   even allows whitespace anywhere            
    but because of the latter you can't indent it    
     with your other code   
 "@ -split "`r`n" | % {'<SOL>{0}<EOL>' -f $_ }
 <SOL>    here strings in PS are fine for this purpose and <EOL>
 <SOL>     even allows whitespace anywhere            <EOL>
 <SOL>      but because of the latter you can't indent it    <EOL>
 <SOL>       with your other code   <EOL>

kazinator4y ago

I posted a Unix StackExchange answer with some tricks for doing this in shell programming, very similar to your <SOL> trick.

https://unix.stackexchange.com/questions/76481/cant-indent-h...

lanstin4y ago

And backslash doesn’t let you have the literal obey the proper indenting. Might as well use “””

kazinator4y ago

> you can easily find the definition and meaning of everything you read on the screen

I don't want to be finding definitions of things that the language provides in the code.

"I've spent all my life in and out of jails, so I expect bars on doors and windows ..."

Spivak4y ago

kazinator4y ago

"require 'json'" is just another #include in disguise, and if it monkey patches existing classes, it ... probably should not exist in any form.

If the language supports json, it should just do that.

  1> #J[1,2,3]
  #(1.0 2.0 3.0)
  2> (get-json "[1,2,3,{\"foo\":true}]")
  #(1.0 2.0 3.0 #H(() ("foo" t)))
  3> (put-json #(1.0 2.0 t))
  [1,2,true]t

1 more reply

tgv4y ago

The difference is: in C, it's pretty unlikely someone wants to add strings. I suppose it's even illegal in the later C versions.

kazinator4y ago

It is positively not illegal in any standard verision of C since ANSI C 89.

It's an essential feature used in all sorts of everyday code.

C99 added printf conversion specifiers that are hidden behind macros, and idomatic usage of them relies on string catenation.

  uint32_t x = 0;

  printf("x = " PRIx32 "\n", x);

where PRIx32 might expand to "%lx" (if uint32_t is the same as unsigned long in that compiler).

All sorts of C macrology relies on string catenation. Kernel print messages:

  printk(KERN_EMERG "%s: temperature sensor indicates fire!", dev->name);
                   ^ must not have comma here

2 more replies

edflsafoiewq4y ago

The Python certainly looks nicer though.

1 more reply

oaiey4y ago· 15 in thread

I am a bit in shock. Accidental string concatenation. Python just lost a lot of reputation in my brain.

ErikCorry4y ago

Misspelling a variable on the lhs of an assignment just causes a new variable to be created with the new name. That's a lot worse in my book.

version_five4y ago

Missing an operator resulting in explicit behavior is much more subtle and not even obvious behavior. For those who use python, it is worse.

3 more replies

ReleaseCandidat4y ago

I'd say unexpected behavior is always worse than expected one.

Yes, you'll certainly find somebody who doesn't know what 'not statically typed' means, but ... And yes, there are also C(++) users, that expect strings to be concatenated like that.

1 more reply

oaiey4y ago

1 more reply

voltagedivider4y ago

Isn't that common for all/most languages that don't require explicit typing?

5 more replies

jstx14y ago

That’s a complaint against the entire type system, nothing to do with misspelling.

1 more reply

colpabar4y ago

oaiey4y ago

I completely get that. That is a very nice feature for building DSL or libraries with special needs. But it makes the overall language very dangerous.

Is this "operator" overloadable on each type in Python?

And that scares me a lot. I think I have to reevaluate my position towards Python.

1 more reply

silisili4y ago

Why not just use plusses? Or perhaps a join func, which would accomplish the same.

I get the use case as you described it, but it just seems like minimal effort to accomplish and have some semblance of explicit/safety.

1 more reply

BeetleB4y ago

Heh. I use it all the time the way you do and didn't realize this is alien to many developers (no one in my team every complained about it).

It's common in some languages and used the way you use it. I looked in PEP8 and it seems they don't discuss this.

shultays4y ago

You could have the same behavior by enforcing + operation in between

  mylongstring = "hello" +
    "world"

No idea if python's way of indentations allows this but sounds like it should

2 more replies

atleta4y ago

The string concatenation in itself should not be a problem as it's really just string constants. (But again, it might be irony exactly because of this :) )

oaiey4y ago

Unfortunately no irony.

1 more reply

ErikCorry4y ago

In most languages an array with 3 elements has the same type as an array with 2 elements so the type system isn't going to warn you about the difference between

("foo" "bar", "baz")

and

("foo", "bar", "baz")

2 more replies

ehsankia4y ago

C/C++ has the exact same thing, no?

aeturnum4y ago· 11 in thread

[1] I know there are libraries that do this, I am not seeking recommendations.

ehsankia4y ago

A lot of people in this thread are using this to make fun of Python, but the exact same issue exists in something like c++, here's some I fixed recently:

https://github.com/UWQuickstep/quickstep/pull/9

https://github.com/tensorflow/tensorflow/pull/51578

https://github.com/mono/mono/pull/21197

https://github.com/llvm/llvm-project/pull/335

aeturnum4y ago

I didn't understand anyone to be saying that Python is the only language to have this flaw.

1 more reply

rbonvall4y ago

In hindsight, singleton tuples are not common or useful enough to deserve their own syntax. If the way to create them was something like this:

    t = tuple.single("hello")

we'd thing it's ugly or inconsistent, but definitely not confusing or bug-prone.

hddqsb4y ago

One place where singleton tuples used to be common is with the old "%"-formatting, specifically in the case where there is a single argument and its value might be a tuple:

    x = (1,2,3)
    #print("the value of x is %s" % x)   # breaks if x is a tuple
    print("the value of x is %s" % (x,)) # works even if x is a tuple

There is a readable way to create singleton tuples, without the sneaky trailing comma or a new function like tuple.single:

    tuple(["hello"])

The square brackets can be slightly annoying. I recall writing the following function to omit them:

    def tup(*args):
        return tuple(args)

I am reminded of a somewhat similar issue with empty set literals: {1,2} is a set, {1} is a set, but {} is a dict. The way to create empty sets is using set().

1 more reply

macNchz4y ago

aylmao4y ago

aeturnum4y ago

3 more replies

tyingq4y ago

C lets me do this, and doesn't say much about it.

  char ch_arr[3][10] = {
      "uno",
      "dos" 
      "tres"
  };

3 more replies

anyfoo4y ago

Fully agreed. If python had a proper static type system, those typos would hardly matter, and you'd have the best of both worlds: Convenient, concise syntax, but still confidence in your code.

[1] http://mypy-lang.org

1 more reply

luhn4y ago

hsbauauvhabzb4y ago

I’d rather a compile time error over an exception (or both), which in many cases can occur. I know mypy does this, maybe I should alias python=“mypy&&python”

karolkozub4y ago· 8 in thread

lumost4y ago

Consider that you may be describing a compiler. Typos are not generally a problem in statically typed languages with notable exceptions such as dictionary key lookups etc.

samhw4y ago

1 more reply

joatmon-snoo4y ago

I actually recently joined a startup working on this problem!

[0] https://marketplace.visualstudio.com/items?itemName=Trunk.io

rikateeOP4y ago

cool product :) it is just linting or do any of the tools do code transformation to offer the fix for the lint failure? (code review doctor also offers the fix if you add the github PR integration)

1 more reply

atleta4y ago

joatmon-snoo4y ago

rikateeOP4y ago

yeah something like sonarqube or https://codereview.doctor (if you use GitHub)

rak15074y ago

micimize4y ago· 7 in thread

For those looking to avoid this specific problem, there is a flake8 rule: https://pypi.org/project/flake8-no-implicit-concat.

oblvious-earth4y ago

Also all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.

thrdbndndn4y ago

https://github.com/tensorflow/tensorflow/tree/0d8705c82c64df...

    STOP!
    This folder contains the legacy Keras code which is stale and about to be deleted. The current Keras code lives in github/keras-team/keras.

    Please do not use the code from this folder.

Yeah, not the most obvious notice.

1 more reply

sundarurfriend4y ago

> all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.

pfisherman4y ago

trulyme4y ago

Black rules. I love it that I don't need to have a discussion about style with anyone when Black is used on the project.

1 more reply

tedmiston4y ago

The URL in this comment has an incorrect TLD: it should be `doctor` (singular).

https://codereview.doctor/

rikateeOP4y ago

there is also https://pypi.org/project/flake8-tuple/

typo in the url (or in HN's markup) btw: it's https://codereview.doctor

routerl4y ago· 6 in thread

tl;dr: Python concatenates space separated strings, so ['foo' 'bar'] becomes ['foobar'], leading to silent bugs due to typos.

edit: This also applies to un-separated strings, so ['foo''bar'] also becomes ['foobar']

Palomides4y ago

I assume it's based on the C behavior, where it can be handy with macros

I don't think it fits well in python

pmontra4y ago

1 more reply

pletnes4y ago

I assumed it was borrowed from shell, where everything can just be put next to eachother since it’s all text.

idealmedtech4y ago

It's a holdover from C, where implicit string literal concatenation is very useful in the preprocessor.

thrdbndndn4y ago

I luckily never accidently used this space-concatenation thing, but I've been bitten by the fact a=(1) doesn't create 1-element tuple multiple times in my early days learning Python.

onphonenow4y ago

I still don't understand why it doesn't! So I still get bit from time to time.

2 more replies

prepend4y ago· 6 in thread

This seems like not a big deal. It’s a common mistake and is in 5% of repos but it’s not causing major damage.

And there’s no evaluation of importance as to whether these instances are in test files or non-critical code. Packages are big and can have hundreds or thousands of files.

It could be that if these mattered, they would have been detected and fixed.

A good example for unit tests and perhaps checking to see if these bugs are covered or not covered.

I like these kinds of analyses but don’t like the presented like it’s some significant failure.

jollybean4y ago

5% of 'released' software is quite a lot, more importantly it's a class of errors that definitely should not exist. This is a 'bug' in the language effectively there just isn't any real upside.

Python has a few of these things, which is really sad.

bcrl4y ago

4 more replies

jve4y ago

I checked those those 11 links to issues for major software. 10 bugs were actually in tests...

3 more replies

onphonenow4y ago

There were proposals to fix some of these but the unicode zeal beat out some of the more boring (but I'd say as important) cleanups.

rikateeOP4y ago

yeah the impact varies. the sentry one seems pretty big: https://codereviewdoctor.medium.com/5-of-666-python-repos-ha...

enchiridion4y ago

I mean, if you’re ultimately going to combine the list into a string anyway it’s no big deal.

Along those lines. I wonder how many of these come from ad-hoc file path handling instead of using pathlib.

tyingq4y ago· 5 in thread

Seems expected, as linters can't be sure when it's not intentional. Like this request to pylint:

https://github.com/PyCQA/pylint/issues/1589

Is there usually enough context for a linter to make an educated guess?

mikepurvis4y ago

I would have thought it would be a no-brainer to just ban it and insist on an explicit + operator. I'm pretty surprised that issue was so flippantly closed.

thaumasiotes4y ago

> I would have thought it would be a no-brainer to just ban it and insist on an explicit + operator.

1 more reply

ReleaseCandidat4y ago

The PR has been merged (for lists and tuples and sets only).

https://github.com/PyCQA/pylint/pull/1655

rikateeOP4y ago

can do a good job at allowing long urls for example, but would be whack a mole trying to cater for "all" purposeful implicit string concatenations

chrismorgan4y ago

arusahni4y ago· 4 in thread

The removal of implicit string concatenation was proposed for Py3k[1], but was rejected.

[1] https://www.python.org/dev/peps/pep-3126/

wodenokoto4y ago

The rejection notice seems completely counter intuitive to me. How is adding a plus "harder" compared to removing a foot gun?

> This PEP is rejected. There wasn't enough support in favor, the feature to be removed isn't all that harmful, and there are some use cases that would become harder.

oa20224y ago

This change would break a lot of legacy code for no good reason

The most common way to split a string in lines is using this concatenation formula.

4 more replies

Wowfunhappy4y ago

I suppose this is also something you could catch with a linter?

cpeterso4y ago

Yes: import from __future__

https://docs.python.org/3/library/__future__.html

1 more reply

bilalq4y ago· 4 in thread

deathanatos4y ago

Did you figure out what the context is, and if you did, would you mind spelling it out for me? I still haven't figured out what correction to make to that sentence to get it to make sense.

rikateeOP4y ago

in a blog post about the evils of typos there was a typo! classic https://en.wikipedia.org/wiki/Muphry%27s_law ;)

1 more reply

bilalq4y ago

They ran their static analyzer over a sample of GH repos. They chose 666 as the number for their sample size. That's all.

dudeinjapan4y ago

It's further evidence that the Illuminati intentionally put these typo bugs there to destabilize the global order.

dannymi4y ago· 4 in thread

Using the plus operator to concatenate strings is just weird.

Think of the usual algebraic properties these operators are supposed to have.

"+" always is supposed to be commutative--so "a"+"b" = "b"+"a", if those mean alternatives (they usually do mean that in mathematics), is just fine.

On the other hand, multiplication is often not commutative--also not here. "a" "b" != "b" "a".

So string concatenation should be the latter. And indeed that's how it's in regular expression mathematics for example.

int_19h4y ago

Juxtaposition is not multiplication in this context - you can't write (2 3), for example, it has to be (2 * 3).

Furthermore, Python already uses * for strings to indicate repetition: ("foo" * 2 == "foofoo").

On the other hand, D uses binary ~ for concatenation. That has a neat mnemonic: it's a "rope" that "ties strings together".

housecarpenter4y ago

hayd4y ago

Which invertible commutative string operation would you choose for + ?

This might be nice from a math point of view, but I think users are going to be confused using "string"^3 for repetitions (instead of "string"*3). + and * make too much sense to the unwashed masses.

At any rate, explicit is better than implicit.

dragonwriter4y ago

There is no reason you couldn't use str * str for concatenation and str * integer (or even string * real) repetition.

Well, except if you wanted to support user classes that could duck type as both strings and numbers, which it would make awkward.

shoyer4y ago· 2 in thread

chillee4y ago

Same :P I'm actually responsible for one of these (https://github.com/pytorch/pytorch/issues/70607), but it's a typo in a list of tests to skip.

hvdijk4y ago

pmontra4y ago· 2 in thread

As a comparison, in Ruby

  puts "a" "b" == "ab" # true

and

  puts "a"
    "b" == "ab"

prints "a" with "b" == "ab" evaluated to false and discarded. This could create bugs as with Python. However

  ["a"
     "b"] == ["ab"]

is syntax error at the beginning of the second line. The parser expects a ] It would evaluate to true if it were on one line.

grey-area4y ago

In Ruby one too many commas can also cause problems:

# list

list = "a","b",

# function

def foobar

end

=> ["a", "b", :foobar]

int_19h4y ago

The implicit concat of string literals is the culprit here. It really should require "+".

titzer4y ago· 2 in thread

Just to be clear, the V8 "bug" was in the test runner code and caused mis-parsing of command line options for testing for non-SSE hardware. Not exactly a critical bug.

jeffbee4y ago

The way the bug arrived in that test runner is interesting. It sneaked in mid-review. Possibly bugs added in the middles of code reviews are more likely to get through.

https://chromium-review.googlesource.com/c/v8/v8/+/2629465/3...

Personally, I prefer uniform lists with leading commas, because it's easier to add and remove lines for later, inevitable refactoring. For example, I prefer:

  things = [
    'foo'
  , 'bar'
  , 'baz'
  ]

This drives some people crazy, but I think it's the One True Way.

aflag4y ago

Isn't

  things = [
    'foo',
    'bar',
    'baz',
  ]

even better? In your case, if you want to add something to the beginning of the list you'll have to modify two lines.

1 more reply

LAC-Tech4y ago· 1 in thread

A lot of people are criticising dynamic typing for this.

It doesn't seem to have anything to do with typing discipline.

    words = (
        'yes',
        'correct',
        'affirmative'
        'agreed',
     )

Would be a tuple (immutable list) of strings, while

    words = (
        'yes',
        'correct',
        'affirmative',
        'agreed',
     )

would also be a tuple of strings.

If haskell had for some reason decided to have the same syntax sugar, it also would have caused an issue.

motles4y ago

You got me for a second there.

wartijn_4y ago· 1 in thread

It's both good for those projects and for the company that does the marketing since they reach there exact target group. Plus it gets them on the front page of HN.

ehsankia4y ago

tus6664y ago· 1 in thread

Alternative title: 5% of Python repos has inadequate test coverage.

_dain_4y ago

Most of the errors were in the tests themselves.

Pensacola4y ago· 1 in thread

Why 666?

aflag4y ago

It's a biblical number. No deeper meaning.

xvilka4y ago· 1 in thread

hesdeadjim4y ago

Yea, but TDD! :eyeroll:

wirthjason4y ago

Ironic to see this today. I spent an hour debugging this very same issue this morning.

I was just doing some simple refactoring, changing a hard coded sting into a parameterized list of f-strings that’s filtered and joined back into a string.

ehsankia4y ago

Nice! Internally we have a PCRE support on our code search and I regularly run a regex to find and fix these. I've also found a ton on opensource project which I've been trying to fix:

https://github.com/YosysHQ/prjtrellis/pull/176

https://github.com/UWQuickstep/quickstep/pull/9

https://github.com/tensorflow/tensorflow/pull/51578

https://github.com/mono/mono/pull/21197

https://github.com/llvm/llvm-project/pull/335

https://github.com/PyCQA/baron/pull/156

https://github.com/dagwieers/pygments/pull/1

https://github.com/zhuyifei1999/guppy3/pull/12

https://github.com/pyusb/pyusb/pull/277

https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull...

EDIT: I will point out that I've found this error in other non-Python code too, such as c++ (see the 2nd PR for example).

Here's the regex for anyone curious:

[([{]\s*\n?(\s*['"](\w)+['"],\n)+(\s*['"]\w+['"]\n)(\s*['"]\w+['"],\n)*

pxeger14y ago

The first one, the implicit concatenation, I can see. But the rest of the things seem like most of the time they're intentional.

    {
        'key': (
            'long string long string long string'
        )
    }

Using parentheses like this to put long strings on their own line is standard practice.

    title = 'Hello world',

I, for one, have often used this deliberately.

the_gigi4y ago

I often use split().

Instead of:

  s = ['a', 'b', 'c']

I'll type:

  s = 'a b c'.split()

For multiline lists where I want to get rid of leading whitespace I'll add lstrip():

  lines = """line 1
             line 2
             line 3
  """.split('\n')
  lines = [line.lstrip() for line in lines]

anonymousiam4y ago

Heh. "cromulent" again.

..."there are perfectly cromulent reasons a developer would do implicit string concatenation spanning multiple lines"...

https://www.merriam-webster.com/words-at-play/what-does-crom...

Subsentient4y ago

einpoklum4y ago

delgaudm4y ago

When I used to write code, especially SQL statements I would:

    "put"
  , "Commas"
  , "first"
  , "to"

avoid these kinds of things.

gumby4y ago

Clearly bugs by programmers who don’t adhere to the Oxford comma.

timzaman4y ago

Haha the bug in Tensorflow is in "tensorflow/tensorflow/python/keras/engine/training_generator_test.py". clickbait.

codeptualize4y ago

And then people make fun of JavaScript! (Just joking, I like Python, also JS, I guess everything has its quirks, it's a good thing we have linters)

ficklepickle4y ago

Ironically there are a variety of typos in the article.

A paragraph is repeated and the markdown links at the end are broken because there is a space between ] and (.

asow924y ago

I'm sure the devil is in the details on this bug.

dragonwriter4y ago

v8 may be a repo that includes some Python, but there is no reasonable standard by which it is a “Python repo”.

Forge364y ago

I wonder if any of the found issues will turn out to be important issues.

jiveturkey4y ago

nice ad!

j / k navigate · click thread line to collapse