Ask HN: Do You Buy Servers?

35 pointsbalex4y ago24 comments

We're an Open Source startup focused on managing the lifecycle of server fleets, such as maintenance, waste control, etc. To get a grasp of what we're about: https://github.com/rebootoio/vaxiin-sandbox

We're thinking of tackling the challenges of new hardware procurement. From the initial "what do I need?", through generating a BoM with all the boilerplate, validating the quote, the delivery, etc.

We'd like to know if and what are the pains that you experience in this process. For example: - Who in your org generates the initial spec? - What's your workflow from spec to quote to delivery? - What works and what's broken?

35 pointsbalex4y ago24 comments

We're thinking of tackling the challenges of new hardware procurement. From the initial "what do I need?", through generating a BoM with all the boilerplate, validating the quote, the delivery, etc.

24 comments

19 comments · 8 top-level

easytiger4y ago· 5 in thread

Dunno if this is what you are looking for.... but I've specified hundreds of high end servers (all HP) in very large banks onprem and in colo in 4 geos.

- often you are limited to "off the shelf" approved subsets of hardware+rhel configurations that have been tested by a centralised team. These teams rarely have a clue about what they are doing, are highly inflexible and their specs very out of date.

- with colo there has been more flexibility as these are bought for the bank by a 3rd party

- very large number of NICs needed for bonding + seperate physical network for client connectivity/internal connectivity/exchange connectivity.

- driven by business requirements from whatever trading area

- their own technology team

- then massive multilayer procurement sign off process that can take 6+ months

- the available hardware might change or increment during this process requiring a reset if the appetite exists

- usually not bought direct from manufacturer but a "pre approved" vendor who passes bank procurement onboarding procedures

- servers all have iLo for mgmt/failure etc - usually requires interaction with an official systems team via internal bank ticketing/helpdesk system

- colo you open tickets with e.g. Eqinix to receive the hardware and have it put in your cage or whatever pending weekend install by bank or 3rd party team who hook them up to switches as per agreed design with network+systems and business tech team

- often when live there are 2/3 different teams monitoring the hosts 24/7 using various tools

prob left out a load of stuff.

chomp4y ago

This is pretty accurate from my time in a large company. My team chose the pre-approved SKUs because we had other teams going nuts buying extremely dense servers and then being shocked with "what do you mean our power commitments don't allow us to go this dense in a rack?". So, we wound up with 2U (2U Twin^2)/3U (microcloud)/4U variants, and if you needed less than that, then you get to snuggle up with other teams' VMs on the VM stack.

By far, the most painful part of the process is the ends of the pipeline (procurement for the project, and the datacenter) not knowing anything about the opposite end. The team procuring the hardware would just assume it'd go fine into the datacenter however they dreamed it up, and the datacenter team would be flying blind with little details as to how the system needed to be set up. It was rare that a procuring team would know what they're doing with which nics need to go in which ports, and what VLANs would need to be spun up, and stuff like that.

easytiger4y ago

> It was rare that a procuring team would know what they're doing with which nics need to go in which ports, and what VLANs would need to be spun up, and stuff like that.

yea. many of the teams would be devs who wouldn't understand that kind of thing (which was a large part of my job, being in such teams and knowing, roughly, how things worked helped get things done from day 1).

1 more reply

bradwood4y ago

You forgot the 15 CAB calls and Change Requests, PTWs and the general wall of ITIL process drag that you need to get through to get the machine racked, stacked, powered and brought online

easytiger4y ago

Yea days of paperwork. I was doing the Hollywood version

1 more reply

balexOP4y ago

Yes, this is what we're looking for.

We're wondering if automating this headache to some SaaS that offers BoM templating, compatibility validation, interchangeable parts and such, could be of interest to such an org. Essentially they'd be left with approvals and the rest would be stepping through an automated workflow. WDYT?

exabrial4y ago· 4 in thread

We've been looking at the logistics of co-location. Cloud Prices, even at Digital Ocean and Linode, are pricey!

The problem is finding anyone with experience doing this. It's a skillset that's sorta been lost.

balexOP4y ago

OP here.

And to top it off, it seems like it's all undocumented know-how. You have to be a practitioner to pick it up.

We find that the tools are focused on all sorts of inventory (DCIM) and provisioning. Processes are homegrown based on whatever ticketing system you have in place. That's what we're looking to change.

So far it feels like the colo side aren't hyped around introducing tools to automate things. It's like their jobs are at risk. And the customer side is bent on building their own tooling from scratch.

We have found some early adopters, but they're hard to come by. Even when we can quantity savings in money and time to market. Not to mention engineering frustration.

Would love to have a conversation around this!

exabrial4y ago

Yep. I'm "smart enough" to know that it's _got to be_ harder than it looks: Buy servers, rack them, run stuff. Obviously there's a _whole world_ of hidden logistics here that could go _terribly_ wrong. I'm "dumb enough" to not know what those real world things are.

jjeaff4y ago

Are you also comparing simply renting dedicated servers? Unless you need very specialized hardware or just huge amounts of hardware, that seems to make a lot more sense for most people.

jacooper4y ago

Check out more affordable cloud providers like hetzner or OVH.

c_o_n_v_e_x4y ago· 1 in thread

I was just looking at your repo... Please correct me if I'm wrong but can't you do the same thing with Ansible or equivalent?

balexOP4y ago

Not really. Vaxiin identifies a system going down via a dead man's switch, then goes out and grabs a screenshot out-of-band, runs OCR on it and matches reaction rules against the resulting text. Reactions might be ipmi commands, keystrokes, HTTP requests or a combination of the above.

It might make sense to add an Ansible type of reaction as well, though I'm unsure of the scenario.

Ir0nMan4y ago· 1 in thread

Automation built on top of ipmitool? I could see some people liking this.

balexOP4y ago

Feel free to let then know :-)

I think abstracting ipmitool is nice but it's just a building block. The real killer is when you put it all together - recognizing a system is down, and bringing it back up automatically despite having no working OS. It might require using ipmitool. But it might also require clearing state in a provisioning system via an HTTP call. Or sending some keystrokes.

It's about moving beyond manual recovery when no OS is present.

f0e4c2f74y ago

For me, I would not consider buying physical servers for quite a long time. That's coming from someone who has quite a bit of experience working with hardware. If you shipped me a box of parts that would be just as well as a physical server. It's just the time / maintenance / cost that goes with that.

On the other hand, there is also that growth tipping point where cloud is not cost effective anymore and you need to switch to physical servers, or at least it would be a cost savings doing so.

That's quite a while later though save for some niche cases.

One interesting lane might be to combine these two and offer a one stop service for getting cloud servers and hardware. In particular you could offer guidence (automated or otherwise) about when it makes the most sense to use cloud or onprem at what scale.

Scope could also include things like colo vs self hosting, What datacenters to use, where to buy land to build new datacenters, region and cdn consideration, etc.

nonameiguess4y ago

My wife used to be involved with data center provisioning for what the NRO called "agency unique equipment." This was a facility and a contract that did centralized infrastructure procurement, installation, and tier 1-2 support for any NRO workloads that needed to use hardware not available in Amazon's classified GovCloud enclaves.

Full workflow was customer generated and provided the initial build spec, her program (called "Palinode") validate this spec, made the purchase, received equipment from the vendor, assembled it in test racks at a contractor facility, performed a physical inspection and validation of the build, then shipped it to the production data center, installed it, and performed final validation on-site, usually with a customer witness. For that, they used a suite of custom tools that matched the functionality your linked toolkit seems to be providing, and also had management and alerting tools all developed in-house.

I'd say the biggest pain points were not really something any third-party or software provider could alleviate, and that was the locked down firmware and NDA agreements with the network appliance vendors. They experienced so many bizarre appliance failure modes that could not even be debugged except with chip-level log access that required vendor reps to come on-site just to be able to read them. And there was a major fire-drill level issue with defective memory they couldn't even disclose to their customers because of the vendor NDA, and it took a really long time to address it. I was honestly pretty surprised by that, as the vendor is a publicly traded enormous company that you think would need to legally disclose product defects like this in components that could potentially power safety-critical devices, but for whatever reason, that wasn't the case, and remediation was seriously slowed down by all the need for secrecy.

The only thing I can see changing this is competition, but it largely doesn't exist at the chip level or for specialized hardware. Many components and appliances only have two or three vendors, and the levels of compliance you have to go through to host classified data combined with patent law make it nearly undisruptable as an industry.

SMAAART4y ago

Good customer development! Steve and Eric would be proud.

epirogov4y ago

two times already. a lot of infrastructure is hidden for developers by default. My motivation is in situations I have been confused sometimes when I speaking to dev ops.

j / k navigate · click thread line to collapse

24 comments

19 comments · 8 top-level

easytiger4y ago· 5 in thread

Dunno if this is what you are looking for.... but I've specified hundreds of high end servers (all HP) in very large banks onprem and in colo in 4 geos.

- with colo there has been more flexibility as these are bought for the bank by a 3rd party

- very large number of NICs needed for bonding + seperate physical network for client connectivity/internal connectivity/exchange connectivity.

- driven by business requirements from whatever trading area

- their own technology team

- then massive multilayer procurement sign off process that can take 6+ months

- the available hardware might change or increment during this process requiring a reset if the appetite exists

- usually not bought direct from manufacturer but a "pre approved" vendor who passes bank procurement onboarding procedures

- servers all have iLo for mgmt/failure etc - usually requires interaction with an official systems team via internal bank ticketing/helpdesk system

- often when live there are 2/3 different teams monitoring the hosts 24/7 using various tools

prob left out a load of stuff.

chomp4y ago

easytiger4y ago

> It was rare that a procuring team would know what they're doing with which nics need to go in which ports, and what VLANs would need to be spun up, and stuff like that.

1 more reply

bradwood4y ago

You forgot the 15 CAB calls and Change Requests, PTWs and the general wall of ITIL process drag that you need to get through to get the machine racked, stacked, powered and brought online

easytiger4y ago

Yea days of paperwork. I was doing the Hollywood version

1 more reply

balexOP4y ago

Yes, this is what we're looking for.

exabrial4y ago· 4 in thread

We've been looking at the logistics of co-location. Cloud Prices, even at Digital Ocean and Linode, are pricey!

The problem is finding anyone with experience doing this. It's a skillset that's sorta been lost.

balexOP4y ago

OP here.

And to top it off, it seems like it's all undocumented know-how. You have to be a practitioner to pick it up.

We have found some early adopters, but they're hard to come by. Even when we can quantity savings in money and time to market. Not to mention engineering frustration.

Would love to have a conversation around this!

exabrial4y ago

jjeaff4y ago

Are you also comparing simply renting dedicated servers? Unless you need very specialized hardware or just huge amounts of hardware, that seems to make a lot more sense for most people.

jacooper4y ago

Check out more affordable cloud providers like hetzner or OVH.

c_o_n_v_e_x4y ago· 1 in thread

I was just looking at your repo... Please correct me if I'm wrong but can't you do the same thing with Ansible or equivalent?

balexOP4y ago

It might make sense to add an Ansible type of reaction as well, though I'm unsure of the scenario.

Ir0nMan4y ago· 1 in thread

Automation built on top of ipmitool? I could see some people liking this.

balexOP4y ago

Feel free to let then know :-)

It's about moving beyond manual recovery when no OS is present.

f0e4c2f74y ago

On the other hand, there is also that growth tipping point where cloud is not cost effective anymore and you need to switch to physical servers, or at least it would be a cost savings doing so.

That's quite a while later though save for some niche cases.

Scope could also include things like colo vs self hosting, What datacenters to use, where to buy land to build new datacenters, region and cdn consideration, etc.

nonameiguess4y ago

SMAAART4y ago

Good customer development! Steve and Eric would be proud.

epirogov4y ago

two times already. a lot of infrastructure is hidden for developers by default. My motivation is in situations I have been confused sometimes when I speaking to dev ops.

j / k navigate · click thread line to collapse