- Load balancers
- Web servers
- Caches (eg. Redis, memcached)
- Databases (relational, non-relational, document)
- Search datastores (eg. Elasticsearch, Solr)
- Log/event/message processors (eg. Kafka)
- Task queues/task processing libraries
- Periodic jobs (eg. cron)
If you dig into any of these there's a ton to learn, especially around looking into the underlying technologies used to build these higher-level systems.
There are also more conceptual things that are part of building/maintaining backend systems. These are a bit fuzzier, but I would say are also as important as the specific technologies used:
- Reliability
- Monitoring
- Observability
- Error/failure handling
- Migration strategies
- Data normalization/denormalization
- Horizontal vs. vertical scalability
This is by no means a complete list, but these terms are enough to get you in the right ballpark of ideas and start learning. I think highscalability.com is a great place to read about how other companies have built backend systems to solve specific problems. They have a massive list of quality articles written about various backend systems at scale.
Cron jobs are the definition of the anti pattern of treating servers like pets and not cattle. You have to worry about that one non redundant server running your cron jobs. There are others ways to skin the cat, but my favorite is Hashicorp’s Nomad. I like to call it “distributed cron”. Together with Consul for configuration it’s dead simple to schedule jobs across app servers - the jobs can be executables, Docker containers, shell scripts, anything.
But I agree and concede, a user with zero back-end experience will just google "cron" which will take them to a crontab example, so they will likely be mislead into the anti-pattern, as you said.
I think it is worth noting for any of the systems above that there's a spectrum of possibilities around how much you automate/offload the management of them, as well as plenty of backend systems for managing those.
What's old is new again. Much of clean distributed systems development is now built on, what is essentially, scheduled period operations. They're pretty much the least complicated ways to loosely couple domain logic in distributed systems that follow eventually-consistent semantics. They're also a good model for the functionality of many distributed scheduling systems like Kubernetes, AWS' ECS Scheduled Tasks, and more.
Kubernetes even goes so far to have Jobs (batch operations) and CronJobs (scheduler that creates Jobs).
Cron jobs are the definition of "if it works don't fix it" and YAGNI.
there crontab is not an anti-pattern at all
Also what does observability mean is this context?
Something went wrong, and now your site is serving 500 server errors to everybody at the rate of 25,000 per minute. The ops team already tried "just reboot it" and it didn't help. How are you going to figure out what is going on and fix it?
It's (mostly) too late to add anything, so all you've got is the logs you already had, the metrics you already had, etc. That's the "observable" stuff in a system. There's an art to recording what it is you need to know, while at the same time recording so much that you can't find what you need in the mess.
(The "mostly" is that if you have a good enough setup, you might be able to bring up a new system and route some very small fraction of traffic to it to examine it more intensely in real-time with a debugger or something, though in my experience, on those occasions I've had the opportunity to try this, it's never been a problem that would manifest on a new system receiving a vanishing fraction of a percent of the scale of a production box. But maybe you'll get lucky.)
You certainly want to do everything you can to not be in that mess in the first place, but it won't be enough. You need a system sufficiently observable that you can find the problem and find some sort of solution.
As for reliability, monitoring, and error handling I've heard good things about the Google SRE book: https://landing.google.com/sre/books/
I haven't read it personally, but I've heard good things from others and looking over it briefly the advice there lines up with what I've experienced in practice.
Edit: I decided to go ahead and buy it since it was pretty cheap on Amazon. The table of contents had a lot of the information from the other comments. Thanks for the recommendation!
It's probably too late for you to edit your comment, but the title is Designing Data-Intensive Applications (not Systems).
In the official sense, Security also includes availability (see the CIA triad), so a large portion of the bullets others have mentioned focuses on that.
I think a security mindset (including availability, which leads to thinking about disaster recovery, performance, redundancy, high availability, distributed systems as a mechanism to achieve it etc) is the first aspect that I look in a backend developer.
Other things are important, but security mindset in my opinion is the first layer in the foundation.
Others on my list that may be helpful to deep dive into:
1. HTTP, REST, know it well.
2. GraphQL as a complement/alternative to REST.
3. If using relational databases, learn what is a N+1 selects issue and how to solve it
4. If using NoSQL databases, learn about the CAP theorem and understand the tradeoffs etc
5. Learn to avoid premature optimization. Measure and profile before jumping to conclusion on theoretic bottlenecks that don't exist.
6. Unit test all the things, learn how to mock and what to mock, learn the difference between unit and integration tests.
7. Invest time in good design, read some other open source projets, see how they organize the code, what packages, what modules. Learn about dependency injection, Inversion of control.
8. Learn some cloud patterns, such as exponential backoff, throttling
9. Know everything about cookies, localStorage, XSS, CSRF, JWTs, and session cookies, stateless vs stateful architecture etc.
10. DevOps: Look into containers and serverless, CI/CD
11. Multithreading if you are in a language that has them.
One of the most important concepts is the principle of least privilege which is rarely ever discussed. Every tutorial I’ve seen, even paid ones, (this isn’t to say all of them) give their app master DB credentials. You could vastly improve your applications security just by leveraging your DB engine’s native access control.
For those using Postgres here’s a great resource for setting up RBAC: https://www.postgresql.org/docs/9.0/user-manag.html
Topics include different joins, relationship types, full text search, indexing, JSONB, amongst many others!
Your comment reminds me of a PyCon 2017 talk by Raymond Hettinger where he details the improvements in Python dictionaries from 2.7 to 3.6 which effectively amounted to rediscovering and re-implementing what was standard in databases decades ago.
"Modern Python Dictionaries A confluence of a dozen great ideas" https://www.youtube.com/watch?v=npw4s1QTmPg&ab_channel=PyCon...
Find out how to make your ORM log the SQL it's running, and try running that in your database directly. If you are having trouble, get the SQL working first, and then figure out how to make your ORM generate that SQL.
On the web, use View Source and/or Inspect Element to see what is really in the DOM.
Look at the Javascript you're serving. Is it what you think it should be? Maybe the problem is in your webpack config or caching or something.
If SCSS is giving you trouble, look at the CSS it's generating. Is it what you expect?
If there is a problem submitting a form, look at the Network tab to verify the right things are being sent. Understand how HTTP works here. Try putting the same request into `curl -v`. (Browsers even have a "Copy as cUrl" command nowadays.) If that looks okay, see if Rails is turning your submitting into the right params (or whatever your framework is).
One nice thing about checking the seams is you can debug via binary search: first see if the problem is in the browser or the server. Next see if it's in your app or the database. Next see if it's in your controller or your model. Etc.
This, specifically, is very good advice: using the ORM as a tool to implement a SQL-first solution. And not just good advice for beginners.
> Find out how to make your ORM log the SQL it's running, and try running that in your database directly. If you are having trouble, get the SQL working first, and then figure out how to make your ORM generate that SQL.
Time would be much better spent learning SQL properly. ORM's have their place, they can cut down a tonne of boilerplate, but anyone using an ORM should be somewhat competent in SQL already. If you learn SQL from ORM output then in many cases you'll be learning what not to do.
tl;dr: Keep digging, because everything's worth knowing at some point.
It's generally much easier to fix a bug in your own code or infrastructure if you're familiar with the innards of the things you depend upon directly, and learning your preferred platform beneath the current frameworks is a solid advantage because it helps you move sideways when the next New Framework comes along.
But there's also the conceptual side, the general concepts which have remained unchanged for decades, which a detailed knowledge of a specific platform (eg. Java, .NET, Node.js) doesn't necessarily aid with. These probably won't even feel relevant on a daily basis, but will definitely multiply your effectiveness over the long term.
Some things which might fall into that category:
* SQL and relational databases (especially if you're not using SQL, because you'll be reinventing it for any nontrivial amount of data)
* Consistency models and distributed systems (capabilities and limitations; some things can never be waved away by technology)
* The operating system, and principles of its design (not necessarily in any great detail, but knowing roughly how software talks to hardware is very useful, especially when assessing the marketing blurb for any cloud platform)
* Complexity theory (stringing three 'optimal' algorithms together might be textbook, but often you can apply some domain knowledge to cut out a step or two to reduce actual runtime and maintenance overhead)
I've found stuff like graph theory, principles of compiler design, and a working knowledge of assembly code to be useful too, but that's probably the far end of the bell curve for web server stuff :P
Computer Science is a subset of mathematics and it's amazing which bits of pure maths turn out to be useful in my day job. Shame I'm not very good at that stuff... but reading enough about it to recognise when it's applicable can mean the difference between 'picking a good library off the shelf' and 'hacking together a terrible solution myself'.
So:
- Set up some kind of logging, by default use rsyslog and set it up so your logs are available somewhere other than the machine where something is running. You will be surprised how often something breaks and takes the server down with it so you can't log in.
- Later on, when you have to debug stuff, READ THE LOGS. I can't explain it, but after years of working on production systems I have noticed that almost nobody actually goes back and just reads the logs when things go wrong. You will find the problem spelled out in the logs 99% of the time. YOU CAN'T JUST GUESS WHAT WENT WRONG.
- Use transactions properly.
- If you use an ORM, do whatever you need to do to keep track of how many actual queries are done. You are going to find that you do some order of magnitude more queries than you thought. Learn how to give the ORM hints to avoid this.
And work on your logs. Tweak the info/error/debug levels so that the log files read nice. Too often we find that a common error/exception will spew hundreds of lines of shite into the log files over and over again, obfuscating the real information. Take pride in having meaningful, concise but informative log files.
YES! At one company I was at their web service framework would log thousands of lines of nonsense for every request, most of it errors about not being able to connect to services that no longer existed or thousands of lines of debugging that were never relevant to anything. Make sure you are only logging sensible things (usually this means when you see something being logged for no reason or which is no longer relevant, fix things), and if you want to log a ton of garbage, set things up so you can tell the difference and throw out the junk.
It sounds easy ("derp, back-end is server, front-end is client") but it's really quite a mind bender when you think about it. Take a given piece of website code in a given language. Where is it being executed? Some "front-end" techs (Node, Webpack, Uglify, etc.) will be executed invisibly on the server. Some "back-end" web techs (cookies, redirects, OPTIONS requests, TLS) enable some invisible client behavior. Some (PHP) are dual-purpose, executed in two stages, some (conditional comments) are hacks that exploit this duality. Some (OAuth2) are orchestrated dances between back-end and front-end. It's really not so simple when you think about it.
Modern backend code will usually connect to databases and call REST APIs. The databases themselves might be part of a cluster. Moreover, the backend code is rarely exposed directly to the public; there's usually a proper http server sitting in front of it, and there might be another load balancer or CDN between the user and your http server. Every connection has a client and a server, and sometimes the same component plays both roles. Remembering which role(s) each component is playing in any given context is very important for security, not to mention debugging.
What all use cases are of using web servers?
---
The failure modes; what they are and how they manifest: read contention, thundering herd, cascade, ...
Mitigation strategies and where to apply them: throttling, backpressure, load-shifting, graceful degradation - and of course how to signal the upper layers that these are taking place.
Decoupling reads and writes. (Search for material on CQRS and see where the rabbit hole takes you.)
Error handling and tracing. This is particularly devious, because by definition error path is the unhappy path. It will be more expensive when hit, and has to spend more time serialising data. See also read contention, thundering herd and mitigation strategies.
Learn and understand the differences between: telemetry, instrumentation and monitoring.
However, I don't think they're "must know concepts for back end development". Most backend development is small in scale: a few servers, maybe more, a database, maybe something running cron/equivalent jobs or a queue worker, maybe some caches. Not much besides that. While people learning to develop systems at that scale might incidentally learn some of the above, I don't think most of those concepts will be useful until people are designing systems at much greater scale. Informally, many of them are present in small systems: e.g. when your cache server is down but connections take 10sec to time out, that's a form of backpressure, technically, but not in a super formal or useful way to understand the concept.
In sum, I think those are important (essential, even) areas of knowledge for intermediate-experience back end developers, or developers looking to increase the scale of their infrastructure or projects (or work on very large-scale applications), but I don't think they're in the must-know, 101-level tier of back end knowledge. People can and do build successful, stable, easy-to-work-on back ends without knowledge of any of those things. In fact, this is the norm in our industry.
The lack of knowledge of these areas is not a significant handicap (to productivity, understanding, or code/output quality) until you get beyond small scale. Most software projects do not.
Fair enough, I admit I am looking at things from behind glasses tinted in a certain way.
But I do believe some kind of familiarity of the concepts is essential. Even at relatively small scale. Read contention in particular is really easy to hit the first time you have to deal with an increase in traffic (doesn't need to be external, could be a minor change that triples the number of cross-service calls). Simply because the first observation is that worker systems are running hot, the instinctive reaction is to add more workers.
The two core problem areas in backend development - which in this context can mean anything that requires non-trivial server side processing - are I/O latency and processing capacity. Regardless of scale. Being unaware of (or worse, ignoring) them is not tenable.
I would structure interview questions around APIs rather about distributed systems/transactions, circuit breakers, etc. - practical aspects of running multiple systems talking to each other through pipes.
- Learn about modeling the data of the application you're gonna work on. You're going to be very sorry if you picked a non-relational database for a relational model.
- Avoid NIH: There's a very good chance that the code you're writing could instead be imported from libraries/frameworks that already offer it. Avoid the overhead of reinventing the wheel and focus exclusively on business logic (and sometimes even that might already exist in some oss projects you can use).
I partially agree. Sometimes it's better to reinvent the wheel: you got better control and less risks.
(How can I properly quote someone on HN?)
Well, I'm not saying that people just put whatever first library they found as a dependency. We should review different options and pick the most mature that adjusts to our case.
Ultimately, software used and contributed by many people in a community is far less likely to have risks than one a single developer cobbled together to deliver a feature as fast as possible.
> How can I properly quote someone on HN?
You can't, I just use the markdown syntax and hope for the best =D
General:
- Use a debugger. Stepping through code or existing code is an incredibly quick way to grasp what is going on.
- Exercise healthy paranoia. Expect your code paths to fail. How will you handle failures?
- Code hygiene. Legibility of a codebase makes me respect it a lot more. I feel that I subconsciously handle it with more care when well written.
- Interfaces. Whether your writing an API, a data access utility, etc. -- think about the design in your current and other contexts.
- Language. Whatever language you work in Node, Ruby, Go, etc. know it and its quirks well.
Backend specific:
- Docker. It really changed the way I work. I think the learning curve can be a little steep, but once you're comfortable it leads to much more productive development that is ready for production.
- Load testing. You should be able to identify potential bottlenecks in your application and decide whether they're worth fixing or not.
- Observability. Think about how you will profile and gather metrics of your application when deployed.
- DevOps. A lot of workplaces will expect their SE engineers to help with or even be DevOps. It's a big surface area, so I would focus on knowing the parts of the stack that make your application work (containers, databases, virtual networking)
- Pushing work elsewhere. Sometimes you're going to run into a problem that can't be solved by code changes alone. A basic understanding of how to push workloads elsewhere like message queues / serverless functions and the patterns associated with them. For example, a user uploads an image to your server and you need to compress it.
Would you elaborate on this, please?
For you specifically, I'd recommend picking up a web development framework like Ruby on Rails. It will teach you every aspect of building websites: Interacting with databases, writing server endpoints, creating front end web pages, user authentication, deployment, and probably version control. I would consider all of these things to be the bread and butter of typical "back end" engineers (except for maybe the front end stuff.)
From there, you can broaden your knowledge in any direction that interests you. If you like building interactive applications, you can look into front end frameworks like React or Vue. If you want to focus more on back end, you can learn more about relational databases (Head First SQL is a great beginner resource.) Lots of directions you can go.
As a backend engineer, what you create is basically the core of the product. Data structures and relations you define will be the limitations of your product.
Coming up with intuitive and good data structures for a complex application can actually be a very challenging task and as a backend engineer most of my time is not spent on writing js or sql, but on product design and specs, so I can create something that is stable and scalable, yet not too rigid, considering that iteration is part of the process and everything you create might be subject to change, specially in startup environments.
Data migrations make my palms sweaty (not schema migrations but the kind of migration that requires rewriting large amounts of existing data). Picking the wrong data structure is a fast path to data migrations.
For example, i just googled for "zero downtime schema migration" and this covers the topic relatively well: https://blog.philipphauer.de/databases-challenge-continuous-...
In my experience, doing DB migrations w/o downtime is rarely worth it and involves big risks of actually ending up having long unplanned downtime due to the process being prone to errors. Large majority of schema changes can be performed very quickly and couple of seconds downtime is acceptable in most cases.
Long story short: Don't do DB migrations without downtime unless you absolutely need to.
P.S. If your business guy is requesting deployment w/o downtime, make sure to have a conversation with him to understand why is that so. In majority of cases they make it more dramatic than it really is.
- Understanding in depth what you are supposed to be building, as directly as possible from the stakeholders. Nothing is worse than building the wrong thing perfectly right.
- Coding that understanding into a data model (read: database schema) with great, clear naming, correct entity relationships and as many constraints as possible (including correct types) early on.
- Understanding the front end part that will serve your application and how it interfaces with the users. Only that will give you the details on how to build the API that will serve it.
Iterate these three points until you're done with the application skeleton and only then write the first line of code. If there's anything that will make you insanely productive as a backend developer, it's these crucial first steps of requirements engineering.
1) Get a grasp of what a "back end" _actually is_. When learning- be mindful of what you are going to tackle, and what you are going to leave alone so that you don't get overwhelmed. (In heavier technology environments, you don't really talk about "back end development" since the back end is subdivided into so many separate disciplines. Therefore "back end development" tends to be something that smaller web shops use to encompass everything that is going on on application servers- DONT FEEL THAT YOU HAVE TO BE AN EXPERT IN ALL OF THIS STUFF)
2) Have a reasonable grasp of all of available hosting platforms and their relative pros and cons: Windows/Linux, cloud/local, aws/axure, etc.
3) Get familiar with the lumps of code and/or services that you need to make an application server available to actually do "a thing".
4) Get familiar with standard software development workflow- version control, bug tracking, testing and deployment.
5) Develop a healthy attitude towards application security- understand that it takes a lot of knowledge and effort to make an application 99% secure. Learn how to do what you can with the resources you have at your disposal. On one end of the spectrum you need to know stuff like "never store a password in cleartext", on the other you have to be aware of essentially unsolvable problems like 0-day exploits and social engineering.
6) Get good at communication and teamwork. A backend, by its very nature, needs to talk and get along with other systems so therefore a back end developer needs to talk and get along with other humans.
Once you have accepted a connection, you then have to decide how to handle multiple concurrent connections. Do you use threads? Sub-processes? select/poll?
You can then move on to reading/writing to shared state. Do you use memory? Files on disk? A SQL DB? If using threads/processes, is concurrent access to those resources done safely?
It's a long journey to do the above well. I highly recommend anything/everything by W. Richard Stevens https://en.wikipedia.org/wiki/W._Richard_Stevens
there are thigns that are absolute basics:
- bash/linux scripting, understanding
- usage of basic linux commands, grep, awk, sed...
- the basics of posix (stdin/out/err pipes, stderr)
- Monitor your system: observe CPU/network
- Configure firewall, ssh services, cronjobs, set up a systemd job
And from here how to deploy anything with the stack/framework which is popular this year, kubernetes, docker or ansible or whatever infrastructure it is in the backend.
Very rarely someone works "just coding".
You can’t come up with a real scalable fault tolerant solution if you don’t understand the underlying infrastructure and cloud hosting is the fastest way to do it.
The concepts that must be known are those that apply to a role. For instance, a "web application developer" will likely be a generalist who is able to build an entire web app back end but may not have have ever opened the hood and rebuilt "the engine" or "the transmission" (car metaphors, not literal parts). A popular web framework will have a reasonably designed public API that abstracts away the complexity under the hood.
2. How to read request parameters.
3. How to get the stuff you're asked for from the database.
4. How to create and send a response.
Once you can do that you can create a backend.
5. How to not get swamped in a mess and complexity when you do all above faster than you are required to.
I appreciate this sounds like trivial advice, but I've seen countless developers (including myself initially) that e.g. start with Ruby or PHP and don't understand what happens before their script is called, what receives their request, how paths are mapped, how this magical $_POST object is populated and so on.
There's nothing special about backend development, its mostly about the "plumbing" between systems. Lots of de(serializing) (e.g. json, protobuf) messages into objects and validation of those messages
[0] http://highscalability.com/blog/2016/1/11/a-beginners-guide-...
2. Many Web frameworks -- Django and Ruby on Rails come to mind -- treat the database as an integral part of an application. In practice, however, the database in most cases runs as a separate service, more often than not on a completely separate machine. It is useful to view a database as just another part of your distributed system, with a very clear responsibility (persistence of state), that is accessed via a special API (SQL).
[0] https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...
For example, I use Java Spring a lot in combination with Hibernate. I always have a layer with my ORM classes mapped to the database, and then in my API layer I will have lots of mirror classes that initially look very similar to the database classes with a conversion step from one to the other. You can use libraries like ModelMapper for the tedious conversions.
With this setup it is easy to evolve the database schema without impacting the API and vice versa. This is something that (in my opinion) some web frameworks like Ruby on Rails do wrong by default.
https://github.com/kamranahmedse/developer-roadmap/blob/mast...
- The front end can do whatever it wants. The back end is where all important data integrity and security logic should go.
- Sessions. No one has mentioned sessions? Sessions are somewhat important.
Make sure you really learn TCP and the unix toolchain for digging into network problems e.g. tcpdump. It's always important to isolate things: is it the latest push causing things or is it network problems?
- Persistence - Messaging - Publish-Subscribe / Event Handling - Latency - Separation of Concerns / Layering
If your going to interact with a relational database, you should definitely know SQL.
From there, learning the basics of the stack you choose with a simple project would be the next start.
- foundational networking concepts like ip, udp, tcp, dns, tls
- the difference between synchronous and asynchronous computation/communication
- core details of your programming language's runtime, like the concurrency and memory model