Rather than care about legacy IP "subnets", the cloud cared about "security groups", which were missing only a couple features (such as "hierarchy") to entirely replace the role of a subnet in traditional networking.
Having spent a lot of time working with EC2-Classic, it made network engineering fun and easy. The new "VPC" mechanism is demoralizing in its complexity, and doesn't seem to allow anything you couldn't express using security groups.
I've written about this before--in more detail or with more rhetoric, depending on my mood--but the big feeling I get from the transition from EC2-Classic was the frustration that comes when other people make something worse.
https://news.ycombinator.com/item?id=36829190
https://news.ycombinator.com/item?id=33569889
Which is basically "EC2 Classic networking" but now you pay for it
https://aws.amazon.com/vpc/lattice/
Though it goes a bit further than just security groups on a flat network. Each HTTP endpoint automatically becomes an IAM resource and you can treat all our services as if they're native to AWS and use the same IAM policies. It's pretty dope!
> When we launched EC2 in 2006, it was one giant network of 10.2.0.0/8.
In my understanding, /8 means the first octet would be fixed (10.0.0.0-10.255.255.255) and I'm having trouble understanding the significance of the 2 here. If the 2 is significant, wouldn't you write it as /16? Given the source and my confidence on this notation, I feel like there must be some meaning an expert can glean from this. If so, I'm curious to learn. If it's just a typo, then that's fine and I apologize if this is nit-picky.
Thanks for your critical reading!
I dont think there's any higher honor/role model as a HN community member. inspiring.
When you take the logical AND of the IP and the expanded mask, you'll just end up with 10.0.0.0 as the network address and 10.255.255.255 as the broadcast address.
I guess it's not a typo, but a weird way to write 10.0.0.0/8 excluding 10.1.0.0/16 .
That's also the notation `ip` on Linux supports
...and replaced by the complexities of AWS. I mean, even in the pre-AWS days, it's not like you had to buy the hardware or replace the failed disks yourself, web hosters did that for you.
That’s an oversimplification. Yes, you didn’t go into the data center and replace the disk yourself, but it was a very different process than what you get with AWS.
There are still plenty of web hosts that operate the old fashioned way if people want to use them. There is a reason people prefer services like AWS though.
But AWS needs to be complex to handle the needs of huge organizations.
I will also point out that the AWS console EC2 launching interface has come such a long way. So much to the complexity is handled for you.
Given no single web hoster has grown to the size of AWS / Azure, it is safe to assume just which complexities the industry is willing to tolerate. I mean, Oracle still rakes in billions, despite everything.
I read this while I was taking a break from working on an epic to migrate our stuff off of OpsWorks before it gets shut down in May.
That said, I heard from folks at AWS that it was not well maintained and a bit of a mess behind the scenes. I can't say I'm surprised it's being shut down given where the technology landscape has shifted since the service was originally offered.
RIP OpsWorks.
Once you got up to the levels of AWS CAFO-style "cattle" instances, it stopped making so much sense. With autoscaling, you need your configuration to be baked into the AMI before it boots, otherwise you're going to be in a world of hurt as you try to autoscale to keep up with the load but then you spend the first thirty minutes of the instance lifetime doing all the configuration after the autoscale event.
A wise Chef once told me that "auto scaling before configuration equals a sad panda", or something to that effect.
Chef did try to come up with a software solution that would work better in an AWS Lambda/Kubernetes style environment, and I was involved with that community for a while, but I don't know what ever became of that. I probably haven't logged into those Slack channels since 2017.
IMO, there are much better tools for managing your systems on AWS. CDK FTW!
AWS rarely retires services and when they do they pretty much give months/year(s) worth of notice before forcing you to migrate which is very nice.
I feel like data pipelines and swf have been replaced by step functions+event bus+lambdas/fargate. We've furthered abstractions over time, and that's a good thing.
Edit that said no idea how they scale in comparison
This is why businesses trust AWS.
Another was AWS. I think EC2 had just launched, and we happened to be re-evaluating where we hosted our web properties, because we thought the social media website would get a bit more traffic than our older "web components" offering. It was pretty exciting that we could just click a button and spin up an instance in the US, or in Ireland, or Amsterdam. And if the instance died, just click a button again to spin up another one.
As it is today, so it was then: the simple UI hid quite some complexity. There were the different kinds of storage to learn about, and which ones were persistent, and which weren't. If I remember rightly those early EC2 instances weren't as reliable as they seem to be today either, we actually lost one or two completely and had to rebuild them, so we snapshotted a lot.
There was no infrastructure as code or DevOps, but we did implement our own highly available cluster. One of the engineers I worked with actually wrote it from scratch, in C++ (we were primarily a C++ company). It would monitor the databases in the different EC2 instances, and ensure the database cluster stayed alive.
We didn't really know what we were doing, what was smart, there weren't any cloud architects around back then. But the technology worked really well. Once we got past the initial hiccups, we built a pretty active internet community. The website itself was built in PHP with MySQL, and used XSLT to transform XML data (that in turn was built from relational data from SQL) to generate the HTML. There wasn't a great deal of JavaScript, just some jQuery (another technology I just randomly stumbled on while working on this project that also changed quite a few things). Progressive enhancement, and server-side rendering.
I'm trying to remember how we deployed it. I think we used FileZilla, and then the C++ cluster software would clone the uploaded files to the other EC2 instances in the other AZ's.
I can't remember how much peak traffic we had, but in retrospect, we probably didn't need a server in 3 different AZ's. But damn, it was fun to work on, and it gave me my first introduction to AWS.
Good times.
Where I worked at the time, I wrote some Chef tooling that would hdperf the volumes it provisioned, and if some weren't any good it would provision more until it got 8 good ones. Only then would it RAID them together, then deprovision the bad ones.
Now they have provisioned iops, and I haven't seen a slow EBS volume in a decade.
Microsoft do it, and IIRC Google, so why not Amazon?
I've been waiting for this for about 20 years too :D
Ironically I think post-classic was a regression — it looks a lot more like the infrastructure we had to deal with pre-cloud.
My new stuff is mostly on Cloudflare, so now I’m really not thinking about subnets, VPCs, etc.
Did they suffer a reboot during that migration, or was it done via some live-migration process (it's hard to live-migrate off a virtualization platform that was never designed with that in mind!).
What about the original network setup? Is that still emulated, or might some customer applications have broken?
It seems like AWS spent time, people and money to migrate customers off EC2 classic. They made a fairly good effort to automate the process and make it less painful for customers. For example: https://repost.aws/knowledge-center/ssm-migrate-ec2classic-v...
The original network was from an everyone-on-the-same-subnet model to a you get your own subnet, so yes, customer applications could break in the process. People do all sorts of non-smart things for good reasons, like hardcoding an ip address in /etc/hosts when a nameserver is down. And then they forget to change it back. To do these sorts of migrations well requires a sort of stick and carrot approach. The stick, which is we want to shut down this service and will eventually refuse you service, and the carrot, which includes automation, reminders that people need maintenance windows for their applications, clear directions, and above all, willingness to deal with people and actually talk to them.
In the ideal world, they would have written software to live-migrate VM's to the new platform and emulate the old networking.
Emulating old stuff should be pretty easy, because hardware moves on, and an instance back in 2006 probably had far lower performance expectations - and therefore even a fairly poor performance emulation will be sufficient to meet user needs.
But they gave people YEARS to do that, and tracked down every user to help them if necessary.
I don't see why every user couldn't be auto-created a virtual network with the same 10.x.x.x IP addresses as their original machine had - and therefore there is no need to do any reconfiguration on the users side.
Would be very interesting to learn how that was possible, it seems surprising to me that there wasn't even one instance that the owner forgot about or just was unwilling to do any work on.
It's possible that credit card expiration was the key, as that may have automatically disabled almost all forgotten accounts.
Guess this was something similar. VMs, if not specifically configured to be able to move cannot really be moved automatically. Think about eg. randomness of ordering in PCIe bus (i.e. after moving the devices may not come up in the same order as before moving), various machine ids, like MAC address -- if you don't make sure VM isn't affected by these changes, it's likely that it will be, if moved.
QEMU/KVM/libvirt/... are idempotent when it comes to hardware the VM sees - the exception is the CPU model, that one can't be changed on the fly without at least rebooting the VM in question, and hardware in passthrough mode like GPUs.
All the VM sees from a live migration is a few seconds of "lost" time, as if someone had stopped the CPU clock.
I didn't, and didn't take the job and Google Cloud was a major reason for it. I did not want my job at the risk of Google's decisions. I just don't trust them.
AWS may not be perfect but I don't worry as much that a decision on their part is going to really screw me over.
> It’s quite incredible where cloud computing has gone since then, with a P3dn.24xlarge providing 100 Gbps of network throughput, 96 vCPUs, 8 NVIDIA v100 Tensor Core GPUs, 32 GiB of memory, and 1.8 TB of local SSD storage, not to mention an EFA to accelerate ML workloads.
They decided to omit the price but I think it's relatively good - around 31 USD per hour. Just remember to turn it off after you're done otherwise it will cost you over 7 grand.
https://web.archive.org/web/20150302235811/https://docs.aws....
Huh? Do they mean 10.0.0.0/8, or 10.2.0.0/16 (/15 would also work), or is this a new CIDR notation? Something specific to AWS, maybe?
But iirc following the "Launch Instance" wizard and choosing defaults for everything, as well as a (default) public subnet gives the exact thing you're asking for.
Stated justification was "If we don't bake this in now, it'll be harder to do later", but it was mostly a couple fokls in charge wanted to learn new stuff (someone said that out loud later, confirming my concerns about resume-driven-development).
EDIT: FWIW, my couple small experiences with lightsail itself were fine. Seems like a decent onramp to ease in to the AWS ecosystem, if that's on your roadmap.
A default VPC is created automatically. For simple projects, that's good enough.
> internet gateway and at least one subnet/route table pointing to the igw
Only if you have a private subnet.
And they kept this running for a decade after people agreed there were much better options with an announced 2 year sunset period.
I used to love AWS but after I worked a while at Google in 2013, I switched all my personal projects to GCP for nostalgia (but for work, just used whatever platform customers wanted).
I used google app engine which was orphaned (at least version I tried) so that was a clear contrast w AWS.
Speaking of SimpleDB, we still use it. It's amusing how it's basically swept under a rug at AWS. It's never mentioned, barely documented, but continues to work. It's a pretty good product for what it is - a very simple key/value store where you don't need/want to manage provisioned throughput, costs, keys, etc.
The way they handle SimpleDB makes me respect AWS and feel more comfortable on some other services we also rely on that seem close to abandoned (like ElasticBeanstalk).
However, as a counter-point, they are killing OpsWorks with what feels like a fairly short notice, so I'm also a bit cautious about how long they'll maintain services.
VPCs gave a little more room to prevent that but the big thing was really better tooling - the average developer still doesn’t think about security enough to be trusted with the EC2 or GCP launch wizard.
Amazon can damned well run password strength / compromise tests and validations.
Then in the future when that same person is responsibly for choosing a vendor for a new project, they'll remember.
Almost everything Google does outside of GCP, Maps, Search, and YouTube could evaporate for all I care. Google's problem is not that they cancel stuff, it's the everpresent need to grow revenue and embed annoying ads into more and more of everyone's daily lives. I'd love for them to cancel Gmail with a very short notice period.
The endgame for Google is every Google user loaded full of energy drinks watching ads continuously for 20 hours a day. Every lifestyle that's less profitable than that is something Google will eventually try to engineer away.
OTOH, I successfully skipped the whole AMP saga because I could tell from the start that it was bullshit.
[1]: https://www.theverge.com/2023/8/30/23851107/google-graveyard...
https://medium.com/se-101-software-engineering/what-is-the-h...