Joe Armstrong: "I think the lack of reusability comes in object-oriented languages, not in functional languages. Because the problem with object-oriented languages is they've got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle. If you have referentially transparent code, if you have pure functions-all the data comes in its input arguments and everything goes out and leaves no state behind-it's incredibly reusable. You can just reuse it here, there, and everywhere. When you want to use it in a different project, you just cut and paste this code into your new project. Programmers have been conned into using all these different programming languages and they've been conned into not using easy ways to connect programs together. The Unix pipe mechanism-A pipe B pipe C-is trivially easy to connect things together. Is that how programmers connect things together? No. They use APIs and they link them into the same memory space, which is appallingly difficult and isn't cross-language. If the language is in the same family it's OK-if they're imperative languages, that's fine. But suppose one is Prolog and the other is C. They have a completely different view of the world, how you handle memory. So you can't just link them together like that. You can't reuse things. There must be big commercial interests for whom it is very desirable that stuff won't work together."
- Peter Seibel, Coders at Work: Reflections on the Craft of Programming
Something else I want to mention: what's written above will most likely result in some sort of warmed up discussion in regards to object oriented programming versus something else. Or inheritance versus strategies. Or virtual methods versus method passing. Or whatever else hackernews finds worthy of a discussion this time around.
All of that is entirely irrelevant to the point I'm making which is that monolithic pieces of code are a bad idea. And our solution to monolithic code in Python are classes. If your hammer of choice is Haskell then use whatever the equivalent in Haskell looks like. Just don't force me to fork your library because you decided a layered API is not something you want to expose to your user.
1. When OO was on the brink of becoming mainstream, let's say early 90's, there was an awful lot of the most idiotic rubbish talked about reusability. For example I recall one magazine article that, with a completely straight face, told readers that you would be able to buy an "Aeroplane" class that you could slot into pretty much any application that dealt with aeroplanes. So Joe has a point here.
2. This is not an argument for not using classes within a single codebase. And indeed when I see Erlang functions pushing all the relevant state around as parameters in a big mess of tuples and lists like a crazy baglady with all her possessions in a shopping cart, the benefits of encapsulating it all become clear.
in java, c++, or c#, to add a variable to a class, you have to repeat its name 4 times. once to declare, once in the constructor parameters, and once on each side of the assignment. why am i writing the same thing 4 times for what should be a core part of the language?
in haskell, you write it once (i'm not saying haskell's records are nice, but they got that part right). same with rust.
and with a function, you write it once.
public class Foo
{
public string Bar { get; set;}
}
var foo = new Foo { Bar = "Hey there" }; case class Foo(bar: String, foo: String = "hello")
new Foo(bar = "Hey there")
new Foo(foo = "Hey there", bar = "Hello")
new Foo("Hey there")Providing a getter at least is sometimes practically unavoidable, so there's 1 repetition, but with a good IDE that's just a quick key combination over the variable name.
I'm curious, could you please elaborate on this point? If a class has any dependency, i usually find it better to require that dependency to be passed in the constructor, so the constructed instance can always be in a valid initialized state. Why do you find nullary constructors to be good OO?
- the attribute has a different name
- an attribute is set whose value is a function of more than one argument
- an attribute is set to a constant value independent of constructor arguments
In languages requiring attribute declarations the minimum number of occurrences is 2 (declaration and initialization in a constructor). When declarations aren't required then it's just initialization.How often such circumstances occur is another issue. The form of initialization that you mentioned is probably the most common so languages can provide shortcuts in this case.
Because Java, C++, and C# suck at this. I'm trying to see the relevance to the OP, which is about Python, which rather decidedly lacks this problem. To add a new instance variable in a class, you need to mention it once, or twice if your using a __slots__-based class for compact memory footprint (because then you have to add it to __slots__ and then actually assign it a value somewhere.)
I think this is a good point. I think that it's hard to get from there to "more classes are always better" though. More classes don't always make a design more flexible. You have to consciously design for flexibility and reusability, and even then it takes a lot of deep thought and hard work to get your interfaces right, such that they are truly reusable in practice.
So, I guess the question I ask as someone who doesn't write a ton of python code is: Is that true? And if so, why? Is it not possible to compose a layered API that doesn't rely on inheritance / classes in Python?
I don't think it is. His objections to the json library API can be addressed without introducing a single class.
He seems to want access to the internal implementation of converting bytes into a single Python object. That is probably just an internal function that can just be exposed. If he wants to override it from the top level, the library could permit that by just adding the function as a default parameter.
He wants it to deal with streaming. Python generators can help with this. The library could be refactored into implementing a json.load_stream function instead, which takes a string generator and returns some kind of object stream generator. The original json.loads function could be replaced with a wrapper around it for backwards compatibility and for API users who don't want the additional complexity.
None of this involves writing classes. A generator could be seen as a type of class instance, and one can implement a generator with classes, but generators have a well known and simple API which eliminates the need to define a new API, as the traditional writing of classes would do.
It seems to me that the writer has little experience of not using classes, so wants to solve every problem he has with a class. The API problems he want to address are genuine, but he's blind to any solution that doesn't use classes.
Sometimes this is a little inconvenient. I think the sign you really need a class is when you are sending in something called "object_type", and a dict of "object_type -> function".
Anyone else had the feeling that although Python allows functional style, it somehow pushes your mind away from it?
Just from the JSON example:
- Why do I need a class for streaming JSON - Python's got a perfectly good `yield` for returning tokens in such situations.
- Why would I ever design the JSON library to be extendable at the tokenizer level? If you need serialiser / deserialiser, why not just provide a map of types to callbacks / callback as a parameter? Do you really want to extend JSON format itself?
- The 2GB JSON example is just weird. If you care about such use cases, you a) most likely have a limit on data size at webserver level, b) use proper formats for handling that size of data (I really doubt there's no better data representation once you get to GB sizes).
I see his point of view, but he's arguing for one single "hammer" solution, rather than arguing against the monolithic design. His story seems to present some weird back-story really: "I needed to make my data easier to handle, so I started automatically serialising objects into JSON, then they became huge so I have to start streaming them otherwise just parsing of them takes way too long".
See the msgpack-cli example at the bottom. Say you have a function that returns a generator for tokens in Python. You would need another function that builds objects out of them. How do you customize how objects are being built? A class makes that simpler because each of the methods are extension points you can override.
But yeah, a token stream would be much appreciated.
> - Why would I ever design the JSON library to be extendable at the tokenizer level?
For instance if you want to skip past objects. That's what we're doing for instance for unwanted input data. That implicitly also prevents hash collision DOS attacks because we never build hash tables for things that we don't want. It also gets rid of the suspension of execution when a garbage collector runs and cleans up a nested structure. I can make you tons of examples where the "build a tree, reject a tree" approach destroys an evented Python server.
> - The 2GB JSON example is just weird. If you care about such use cases, you a) most likely have a limit on data size at webserver level, b) use proper formats for handling that size of data (I really doubt there's no better data representation once you get to GB sizes).
Try using a streaming JSON API like Twitter's firehose. Most people just give up and use XML because there is SAX or if they go with JSON they newline delimit it because most libraries are horrible or useless for parsing streamed JSON.
I.e. you don't bundle your serialiser with your json parser. I think that's a good idea. How do you customise? most likely a callback that builds an object or returns the json fragment unmodified if it can't. It can be streamed too in order to accumulate / rebuild fragments of the tree.
Whether object is simpler here (builder / factory style), or another generator, that's a matter of taste mostly.
> For instance if you want to skip past objects.
You can't skip past objects at a tokenizer level, unless you implement logic of skipping whole structure. But that's what the parser does. Why don't you just skip the objects based on the streaming API? You don't have to construct them first - it's up to your deserialiser implementation if you want to ignore parts of the structure.
> Try using a streaming JSON API like Twitter's firehose.
From the documentation (unless I'm reading the wrong format description):
"""The body of a streaming API response consists of a series of newline-delimited messages, where "newline" is considered to be \r\n (in hex, 0x0D 0x0A) and "message" is a JSON encoded data structure or a blank line."""
That's not a huge JSON document. They use the newline delimiter as you described later. I don't think that's because "most libraries are horrible or useless for parsing streamed JSON". Why would you ever want to stream JSON which is an infinite and never complete object? What's wrong about splitting by newline? By making newlines the message separators, you'll never have to worry that your stream becomes broken due to some parsing error: bad message? ignore it skip to newline, continue with next one.
> A class makes that simpler because each of the methods are extension points you can override.
is a strong argument in favour of classes. They're more extensible even if the author doesn't consider it. However - as soon as your class has an implementation like
def to_json(str)
JSONParser.parse(str) # JSONParser is not streamed
end
then you're in trouble. Unless your language supports dynamic lookup of constants, class names feel very much like a global variable that's a pain to change. In Ruby, as of 1.9.3, lookup is lexical so you can't simply define a class-local value for the JSONParser constant.I don't know the story in other languages - I assume Java has it as you see a great emphasis on dependency injection. If dynamic lookup of constants was present I think classes would be more unintentionally extensible however they were written - as it is, you have to be as careful writing classes for extension as for functional code where you have to manually provide extension points.
I use msgpack in Python (twisted, tornado) precisely because it can consume byte buffers which are not token-aligned.
First, write nothing. If you can solve your problem without writing code (or even better, by deleting code), that is the best solution.
Next, write code which fits your architecture. Sometimes functional composition is the best system for representing your computational structure. Tree operations, for example, are especially amenable to recursive function-based computation.
Sometimes, when you are writing a state machine, for example, an object is the best possible representation. Objects are entirely focused around hidden-implementation finite state machines, and thus mirror your computation almost exactly.
Funny how most people who have only practical training in a handful of languages and no programming language theory at all tend to advocate for the one style of programming that they know well. When all you have is a hammer...
Note: The small paragraph at the end of this article seems to hedge by agreeing with me and essentially calling the reader to disregard what he previously wrote. If he had followed my step one he could have avoided writing the entire article. Think of the complexity saved!
What annoyed me most was his constant apparent conflation between bytes and characters. But I didn't really see much evidence of treating all the world like a nail.
What he should have said is that you should pick an interface that suits your computation. The problem was not that the interface was too simple, but that it didn't fit the computation. One can easily imagine an object-oriented interface for the same wrapper computation that also hides the true nature of the computation.
There is no professional culture and new generations of developers successfully forget all the experience previous generations have accumulated.
Also, some piece of advice from a seasoned programmer to the web-programming-children: if you are not really a serious developer, if you write your freaking websitee on Django, or whatever a framework there is, you don't really need a methodology, because you are doing an easy task, you can write in whatever language/style/paradigm you like, even on Brainfuck. But please don't extrapolate your humble experience to the entire industry and don't tell people working on large complicated (real) projects how they must write code, because they have some experience you don't have.
Wait, what?
"Domain pissing match"? Or just inexperience in and misconceptions about writing web apps on your part?
Anyway, that's irrelevant - the advice here is not specific to web programming at all and is a sound one, no matter how experienced you are or what you work on at the moment. Writing modular, reusable and easily customisable code is just a good idea.
Heh, what am I doing. It's still early for me and I'm responding to an obvious trolling attempt from frustrated Java programmer who uses the word "humble" without a shred of humility in his own writing. I should just stop and try porting this "freaking website on Django" that my team has been working on for the last three years to Brainfuck - I'd be more productive that way and I guess it would be more fun, too.
Don't take this the wrong way, I mostly agree with your points regarding the missing "professional culture" among us programmers, just yesterday I sent an email out to our internal list urging my coworkers not to use one-letter variables anymore and explaining to them why that is bad, but when it comes to web projects somehow not being "real" programming I'm afraid you're wrong.
It is true, some momentum was lost when the "open-data" mantra that was flying in the air around 2004-2005 gave way to today's walled gardens and one-page AJAXy apps, but there are still interesting things happening on the web.
The one API that is not powerful enough is Collection. There you have the top layer without the LEGO building blocks. Compare this to the Scala collection API which has the top layers and the building blocks.
For a good API you need both, building blocks to tailer to your specific need (20% of the time) and an easy top layer (80%) to prevent writing the same stuff all the time.
Try this:
packer = msgpack.Packer()
serialized = packer.pack('stuff you wanna pack')
unpacker = msgpack.Unpacker()
unpacker.feed(serialized)
print unpacker.unpack()
I had originally used this, but then as my API scope extended to more than just using msgpack, having a common API interface for json (i.e. the .loads() and .dumps() method) was found to be more useful.And while I agree with most of the article, I don't think writing more classes is a one-size-fits-all solution. Classes IMO, only makes sense from a heavily OOP point of view.
A better way to phrase the OP's salient point while sidestepping the polemic of OOP is "Standardize and use protocols".
You can't. The unpacker in msgpack for Python only reads full objects. The C# version lets me go into and out of streaming at will and even skip past objects without reading them into memory. The only thing the Python version of msgpack can do with the unpacker is buffering up bytes internally until an object is ready. That object however could still expand into a 10GB blob of data which I would then have to postprocess.
But yes, in the JSON cases, having more flexibility than a single 'loads' is clearly justified.
So Armin, if you read this, thank you, it was a joy spending time with your framework (and source code) over the weekend ;)!
The actual implementations do a bit more. I have pretty much overriden every single part of that at one point, if for no other reason than debugging. Some of those hooks were added later because people requested them.
(Although I have to admit that I skimmed most of them, as I'm in serious eyeroll territory whenever I see the old Java IO API used as a bad example again.)
Some libraries manage to skip the token part. (I'm looking at you
simplejson, a library that even with the best intentions in mind
is impossible to teach stream processing)This is in contrast to something like an XML sax parser, that allows you to register for events like "a foobar element was loaded". You get the foobar element while all the other tokens are thrown out the window as soon as they are parsed.
The complaint is that, somewhere, under the hood, simplejson is doing that token parsing, but because of their API, a user can't plug into it.
I read it exactly the other way around: simplejson skips an internal tokenization step, so even if you fork the library it is pretty much impossible to make it streaming, because there's no token stream to handle stream state.
Expanding on Armin's dichotomy, top-down designs like Python's open() or jquery plugins start with giving 70-80% of users APIs that are as simple as possible for their most frequent use cases while shielding them from the sausages underneath.
Bottom-up designs like Java's standard library or POSIX start with LEGO building blocks that solve the most fundamental pieces of largely academic computer science problems and just give them out to their end users and expect them to be able to assemble Tower Defense by solving this LEGO puzzle first.
The problem with sticking entirely to their 2 approaches is that you end up either ignoring power users or making healthy adults with normal IQs feel stupid. There is no reason you can't serve 100% of your user base by incrementally evolving your API approaches and provide 2 sets of APIs within the same library, with the top-down easy one shielding the bottom-up sausage factory that takes care of the meat grinding for you. Most API designers don't realize this and won't ever go there. Extremists and egoists with lots of opinions will spending hundreds of man years to promote their One True Way of doing things. They'll say things like "no leaky abstractions!" or "these enterprise people are just making things too complicated to create jobs!", when the simple truth is probably just that they don't understand how people think.
Make your libraries easy to do things that are easy, but make hard things possible too.
IME Ronacher follows this dictum very well. I'd suggest that anyone who wants to see this try his Flask and Werkzeug packages.
As for storing extra state, about which many here have complained, I've found it really helpful that I can set werkzeug.http._accept_re to my own RE when I want to do something weird with media types. That is state that the vast number of users won't need to touch, yet the fact that it exists makes life better for someone who does need it. I'm sure there are numerous other examples I haven't had to bother with yet. Would we really be better off if this RE had to be passed in every time we handled a request? (Although I would understand if you argued this value should be stored in an object not in a module. I haven't needed that yet.)
On the other hand, routing to and handling resources with these packages is typically done with functions and decorators only, although the route decorators are methods of the application object. So Ronacher is not any sort of hardass about classes; he just does what works.
Now you can bundle related functions together in a structure, but this structure is morally a module, not an object, let alone a class. Some languages will force you to encode those modules as classes / prototypes / singletons, but that's just a design pattern to circumvent a limitation of the language.
However, in certain cases, like where I need to write a public API, I have found that having classes as wrappers to the functionality helps it a bit. So really, the "stop-writing-classes-unless-you- absolutely-need-to" guideline still holds true for me.
So he asks for using abstractions instead on no abstraction (block code) but if you have other ways of abstracting code, please go ahead. THis article is good for people with few / no tools, people who are still learning
On the streaming verus resident working set argument... Most of what most programmers deal with doesn't have to scale to deal with huge streaming datasets, so it doesn't get the attention.
I think Armin's post is somewhat in line with the following post from the google testing blog. http://googletesting.blogspot.com/2008/12/static-methods-are...
For example, Python's lack of variable declarations sometimes leads to bugs involving scoping. (I've run into this a couple of times myself). Does this quirk become more of an issue in heavily functional code? Are there other language quirks that become troublesome in heavily OO code? In what circumstances might one style be preferred over another?
It's tasked with returning records from parsed zones and every single record is a class. I like it and I don't understand the first sentence of this blog post.
Try not to put too much weight on what others tell you, make up your own mind. It's a classic human issue.
It looks the same as XML SAX vs DOM: you feed NN-Mb to DOM-parser (SQLServer xml-datatype for example) and you have problems. No matter: classes or functions.
static String readFirstLine(String filename) { try { BufferedReader br = new BufferedReader(new FileReader(path)); ....
So people writes this everyday, yet still fail to do it correctly...
Also, this article was written 1 day in the future. The future looks bleak to me...