So yeah, you could spend one or two FTE salaries' (or one deep learning PhD's) worth of cash on finding such models for your startup if you insist on helping Jeff Bezos to wipe his tears with crisp hundred dollar bills. That's if you know what you're doing of course. Literally unlimited amounts could be spent if you don't. Or you could do the same for a fraction of the cost by stuffing a rack in your office with consumer grade 2080ti's. Just don't call it a "datacenter" or NVIDIA will have a stroke. Is that too much money? Not in most typical cases, I'd think. If the competitive advantage of what you're doing with DL does not offset the cost of 2 meatspace FTEs, you're doing it wrong.
That, once again, assumes that you know what you're doing, and aren't doing deep learning for the sake of deep learning.
Also, if your startup is venture funded, AWS will give you $100K in credit, hoping that you waste it by misconfiguring your instances and not paying attention to their extremely opaque billing (which is what most of their startup customers proceed to doing pretty much straight away). If you do not make these mistakes, that $100K will last for some time, after which you could build out the aforementioned rack full of 2080ti's on prem.
We don't train ML models, but we are in a similar boat regarding cloud compute costs. Building our solutions for our clients is a compute-heavy task which is getting expensive in the cloud. We are considering options such as building commodity threadripper rigs, throwing them in various developers' (home) offices, installing a VPN client on each and then attaching as build agents to our AWS-hosted jenkins instance. In this configuration we could drop down to a t3a.micro for Jenkins and still see much faster builds. The reduction in iteration time over a month would easily pay for the new hardware. An obvious next step up from this is to do proper colocation, but I am of a mindset that if I have to start racking servers I am bringing 100% of our infrastructure out of the cloud.
It's noisy, it takes up space, and presumably I'm on call to fix it if it breaks.
You should pay them an extra 24x(PSU wattage)x(peak $/Wh in area) per day for the electricity too.
I'm alarmed that someone in your company felt this idea was appropriate enough to propose.
Your plan makes sense but be mindful of the acoustics or your devs may grow to hate you.
Honestly why ever go to the cloud? It seems like a Larry Ellison boondoggle with the absurdly high costs and lock-in. (Ever look at moving your data?)
Running your own metal is cheaper if you actually fund it.
I've saved a ton of money just giving them dedicated workstations to develop on and then having everyone use a shared EC2 instance to push jobs to a fleet of spot instances for large scale training.
Now let's say your customer wants to analyze 2 hours = 120 minutes of video and doesn't want to wait more than those 3 hours, then suddenly you need 120 servers with one $10k GPU each to service this one customer within 3 hours of waiting.
Good luck reaching that $1,200,000 customer lifetime value to get a positive ROI on your hardware investment.
When I talk about AI, I usually call it "beating the problem to death with cheap computing power". And looking at the average cleverness of AI algorithm training formulas, that seems to be exactly what everyone else is doing, too.
And since I'm being snarky anyway, there's two subdivisions to AI:
supervised learning => remember this
unsupervised learning => approximate this
Both approaches don't put much emphasis on intelligence ;) And both approaches can usually be implemented more efficiently without AI, if you know what you are doing.
For the vast majority of people the main expense is creating the combination of a dataset and model that works for their practical problem, with the dataset being the harder (and sometimes more expensive) problem of the two.
The dataset is also their "moat", even though most of them don't realize it, and don't put enough care into that part of the pipeline.
supervised learning => remember this
unsupervised learning => approximate this
This doesn't make any sense at all.
Both are "remembering" something under some constraint, which forces generalisation.
Supervised learning just "knows" what it is "remembering". Unsupervised learning is just trying to group data into patterns.
Both approaches don't put much emphasis on intelligence
Seems like most "intelligence" relies a lot on pattern recognition.
And both approaches can usually be implemented more efficiently without AI, if you know what you are doing.
The evidence is that you are wrong on this for a number of pretty important problems. I don't know much about optical flow, but in the image and text spaces you can't approach the accuracy of neural network approaches with hand crafted features.
I.e. I think that in one minute video, 95% of your images do not have new information in them
unsupervised learning => approximate this"
Lol this can't be more wrong lmao. Both areas "remember" and "approximate" things trough training. The difference is that unsupervised learning does not have labeled data, thus it has to search for some pattern. Honestly not even computer science graduates would say something like this.
- Or Tensorflow/Pytorch could've crapped on OpenCL a little less by releasing a fully functional OpenCL version everytime they released a fully functional Cuda version, instead of worshipping Cuda year in and year out.
- Or Google could start selling their TPUv2, if not TPUv3, while they're on the verge of releasing TPUv4.
- Or one of the other big-tech's Facebook/Microsoft/Intel could make and start selling a TPU-equivalent device.
- Or I could finish school and get funded to do all/most of the above ;)
edit: On a more serious note, a cloud/on-prem hybrid is absolutely the right way to go. You should have a 4x 2080 ti rig available 24x7 for every ML engineer. It costs about $6k-8k a piece [0]. Prototype the hell out of your models on on-prem hardware. Then when your setup is in working condition and starts producing good results on small problems, you're ready to do a big computation for final model training. Then you send it to the cloud, for final production run. (Guess what, on a majority of your projects, you might realize, the final production run could be carried out on on-prem itself; you just have to keep it running 24 hours-a-day for a few days or up to a couple weeks.)
[0]: https://l7.curtisnorthcutt.com/the-best-4-gpu-deep-learning-...
This is hard and time consuming, and this field is hard enough as it is. What makes it even harder is that only NVIDIA has decent, mature tooling. There is some work on ROCM though, so AMD is not _totally_ dead in the water. I'd say they're about 90% dead in the water.
Several issues: 1. electricity bill is still an issue, I've been paying anywhere between $500 to $1000 per month for this workstation (always have something to train). 2. something with a decent memory size (Titan RTX and RTX 8000) cost way too much; 3. once you reached a point of 4-2080Ti-is-not-fast-enough, power management and connectivity setup would be a nightmare.
Would love to know other people's opinions on the on-prem setup, especially whether a consumer-grade 10Ghe is enough for connectivity-wise.
Although once you reach 4 2080ti, you ought to consider switching to a titanium grade psu and rewiring if you're in a 100-120v country. If you're feeling cheap, just steal the phases from two different circuits. Last I looked, most psu operate around 5% lower efficiency on 115 vs 230.
How much is your electricity? I currently run 12 GPUs in my garage pretty much non-stop. 4 GPUs per machine, 3 machines. Each machine is about 1.2KW on average (I can tell because each machine is connected through its own rack UPS), or 13.2 cents per hour, or $95/mo. Which, IMO, is not bad at all. That's less than $300 per month for 12 GPUs.
I have come across fly.io, Vultr, Scaleway, Stackpath, Hetzner, and OVH but either they are expensive (in that they charge for bandwidth and uptime) or do not have a wide enough foot-print.
I guess colos are the way to go, but how does one work with colos, allocate servers, deploy to them, ensure security and uptime and so on from a single place, 'cause dealing with them individually might slow down the process? Is there a tooling that deals with multi-colos like the ones for multi-cloud like min.io, k8s, Triton etc;
It depends what you need in your datacenters! If you just want servers, and don't care about doing something like anycast, you can find a bunch of local dedicated server providers in a bunch of cities and go to town. But you can't get them all from one provider, really, not with any kind of reasonable budget.
You _could_ buy colo from a place like Equinix in a bunch of cities, and then either use their transit or buy from other transit providers.
But also, unmetered bandwidth isn't a very sustainable service, so I'm curious what you're after? You're usually either going to have to pay for usage, or pay large monthly fixed prices to get reasonable transit connections in each datacenter.
In our case, we're constrained by Anycast. To expand past the 17 usual cities you end up needing to do your own network engineering which we'd rather not do yet.
Does adding an extra 100ms to the response time cost you that much business wise?
As for colos, it depends on scale. If you have 30k servers world wide, it pays to have someone manage the contracts for you. If not it pays to go for the painful arseholes like vodaphone, or whoever bought Cable & wireless's stuff.
as for security, it gets very difficult. You need to make sure that each machine is actually running _what_ you told it, and know if someone has inserted a hypervisor shim between you and your bare metal.
none of that is off the shelf.
Which is why people pay the big boys, so that they can prove chain of custody and have very big locks on the cages.
K8s gives you scheduling and a datastore. For a large globally distributed system its going to scale like treacle.
(I'm in full agreement with everything you've written + it's well-phrased and funny. gj!)
[0] that's not a typo - there is such thing as "Oracle cloud"
Context please :) ?
No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.
eg from 2011 6400 Hadoop nodes like http://bradhedlund.com/2011/11/05/hadoop-network-design-chal...
God only knows what fun you could get up to with modern tech - I miss bleeding edge rnd
AFAIK that is limited to <$20k and it expires.
But the real kicker is that I get x5 the cores, x20 RAM, x10 storage, and a couple of GPUs. I'm running last-generation Infiniband (56gb/sec) and modern U.2 SSDs (say 500MB/sec per device).
I figure it is going to take me about $10K in labor to move and then $1K/mo to maintain and pay for services that are bundled in the cloud. And because I have all this dedicated hardware, I don't have to mess around with docker/k8s/etc.
It's not really a big data problem but it shows the ROI on owning your own hardware. If you need 100 servers for one day per month, the cloud is amazing. But I do a bunch of resampling, simple models, and interactive BI type stuff, so co-loc wins easily.
Recent projects have been on AWS. For a project that is roughly on the scale of our colo in terms of instances, though with aggregate lower performance, we are buying one of our colos every year. It’s insane. Network costs are particularly egregious in AWS.
But there is absolutely no way we’d be permitted to build colo facilities for many reasons and there are many reasons why even if we could get permission to do so we would choose not to due the resulting death by a thousand cuts orchestrated by the team who happens to have inserted themselves as the owner for DC/colo like things.
I used to run on-prem back in the 2000's, and we were constantly dealing with demand fluctuation crises. Spinning up new physical servers to deal with new demand, or being massively over-specced when demand dropped, was a real pain.
I'm starting a new thing this week, and using the Cloud for it because I have no idea what our demand will be. I can start small, scale up with our customer growth, and never have to worry about ordering new servers a month in advance so I have enough capacity when (or if) I need it.
At some point in the future, when our needs are clear and relatively stable, it might make sense to migrate to on-prem and save those costs.
If your peak demand is 100x your baseline and only happens for ~1h each day, cloud is almost certainly a good choice. If it happens for ~12h a day or it's only 5x your baseline, the cost of the cloud is such that you're likely to save with dedicated hardware, even though much of your hardware sits around doing nothing part of the time.
> never have to worry about ordering new servers a month in advance so I have enough capacity when (or if) I need it.
There is a middle-ground that's very much worth considering: renting dedicated servers. It's not quite as cost-effective as colocation and owning your hardware when you have at least a cabinet worth of stuff but it does offload the management of the hardware and provisioning to somebody else. They can also usually be provisioned in a matter of minutes.
In some cases (e.g. Packet.net) these machines can even be treated essentially like cloud instances, with hourly pricing.
There's also yet another middle ground: using dedicated to handle the known and predictable baseline traffic and using the cloud to handle the unexpected bursts.
It just doesn’t make financial sense to use the big the cloud service providers for those with consistent workloads. I always hear stories where folks have saved hundreds of thousands in infrastructure costs with owning + co-lo.
As an aside, thank you for your one-line installer script for tf/keras. Earlier, my team used to spend days figuring out the CUDA/tf/keras/CUDNN etc dependency charts, and you've brought that down to ~0.
Some of the same caveats apply with respect to software updates, configuration control, security, availability, business continuity, disaster recovery, and what happens if the local admin is hit by a bus.
But spare capacity is a good idea, especially if you have real-time traffic.
That’s how the DC I used to work in operated.
Now do the calculation for ongoing operations for 5 years, taking into consideration normal hardware failure and maintenance cost. You need to swap out old hardware to get a new CPU, etc. I have tried to use co-loc vs cloud for ~100 nodes and cloud won, by 30%.
I wound up at a facility run by a fiber vendor because they'd sell me a fixed 250mbps pipe for the same price that a data center would sell me 20mbps pipe that bursts to 1gbps. It only works for me because of the nature of my business -- most people would be better off somewhere else.
Choosing a co-loc facility is complicated. My recommendation is to tour and get quotes from 3-5 vendors in your area before choosing anyone. Ideally, take someone who has done it before.
My plan is to temporarily shift to dedicated hardware through a service like Hetzner to evaluate what kind of hardware I need. I can simply redirect a fraction of the traffic and extrapolate. Since this is elastic there will be no upfront costs, but I can play around with different sizes. Once I'm happy with my estimate, buy real hardware and move the rest over.
At least that's the plan. I don't think you can do much more than an educated guess and I think this will be as close as I can get.
Not AI related btw.
I kinda worked backwards from the cost. I ran the business for a year on Azure but each 'sample' of the resample took about 2 mins so it precluded any near real-time analysis. I ported the kernel to a GPU locally using python/numba and it ran in about 10 seconds and that was enough to seal-the-deal.
From there, I spec-ed out a GPU server and then machines that matched each role in my environment. I decided I was willing to spend $50K and just started loading up the machines.
Certainly something like autonomous driving needs machine learning to function, but again, these are going to be owned by large corporations, and even when a startup is successful, it's really about the layered technology on-top of machine learning that makes it interesting.
It's kind of like what Kelsey Hightower said about Kubernetes. It's interesting and great, but what will really matter is what service you put on top of it, so much so that whether you use Kubernetes becomes irrelevant.
So I think companies that are focusing on a specific problem, providing that value added service, building it through machine learning, can be successful. While just broadly deploying machine learning as a platform in and of itself can be very challenging.
And I think the autonomous driving space is a great example of that. They are building a value added service in a particular vertical, with tremendous investment, progress, and potentially life changing tech down the road. But as a consumer it's really the autonomous driving that is interesting, not whether they are using AI/machine learning to get there.
Thankfully transfer learning and super convergence invalidates this claim.
Using pre-trained models + specific training techniques significantly reduces the amount of data you need, your training time and the cost to create near state of the art models.
Both Kaggle and google colab offer free GPU.
IME it is nowhere near as universally successful as this suggests.
I think this sentence invalidates your argument against:
“The number of places where machine learning can be used effectively from both a cost perspective and a return perspective are small.”
In a hobbyist world, free GPU time is an amazing thing, and you can do a lot of fun and rewarding projects using transfer learning and other techniques that avoid heavy engineering and data processing. In a business world, where your product must consistently and accurately perform well, problems that may be solved by ML need to be heavily scrutinized and researched, because for most problems there are cheaper, faster, more robust solutions. Free GPU time doesn't weigh in at this scale.
I've seen a lot of cross pollination of ML and AI techniques into various disciplines. A large percentage just didn't work at all, most of the rest were more "kind of interesting, but". Nothing earthshaking happened although pop sci press likes to talk about it a lot.
If you have more digital data than you used to, using modern free frameworks and toolkits to do basic (i.e. older, boring, but understood) ML stuff to understand it seems to have a reasonable return. Mostly I think this is because it becomes accessible to someone without much background in the area, and you can do reasonable things without having to put 6 months of reading and implementing together before starting.
Also it's doubtful to even categorize machine-learning as science. The goal of science is to generate insight and knowledge, ML solves particular engineering problems or searches problem spaces, it doesn't build fundamental scientific models.
(The dis-economy of scale hurts less if you're already starting from a point with the manual labor.)
I briefly looked at using neural nets to analyse data from an experiment - analysing the efficacy of toilet bowl designs.
The entry level hardware was £250k in 1981 - it was much cheaper to take photo's and have a research assistant count squares.
Now you could use fairly cheap commodity hardware to do it.
It would have been an amazing cutting edge project if we could have got some government funding we did have an in-house knowledge engineer.
So, it's not like I dislike ML. But, saying an investment is an "AI" startup, ought to be like saying it's a python startup, or saying it's a postgres startup. That ought not to be something you tell people as a defining characteristic of what you do, not because it's a secret but rather because it's not that important to your odds of success. If you used a different language and database, you would probably have about the same odds of success, because it depends more on how well you understand the problem space, and how well you architect your software.
Linear models or other more traditional statistical models can often perform just as well as DL or any other neural network, for the same reason that when you look at a kaggle leaderboard, the difference between the leaders is usually not that big after a while. The limiting factor is in the data, and how well you have transformed/categorized that data, and all the different methods of ML that get thrown at it all end up with similar looking levels of accuracy.
There used to be a saying: "If you don't know how to do it, you don't know how to do it with a computer." AI boosters sometimes sound as if they are suggesting that this is no longer true. They're incorrect. ML is, absolutely, a technique that a good programmer should know about, and may sometimes wish to use, kind of like knowing how a state machine works. It makes no great deal of difference to how likely a business is to succeed.
Back then a lot of startups didn't have websites, because they were making other products (hardware, boxed software, etc). If they had a website it was just a marketing page.
So saying that you were going to make a "web application" did in fact differentiate you, in that it showed your approach was very different from the boxed software folks, but it didn't tell you much beyond that.
This is so true. We spent decades educating non-technical people that understanding a problem well is a prerequisite to programming it. Take something easy to understand like driving a car, doing it in a computer is now harder.
AI is undoing all that. People reach a vague problem they can't describe and assume computers will magically fix it.
So...yeah.
L. Pachter and B. Sturmfels. Algebraic Statistics for Computational Biology. Cambridge University Press 2005.
G. Pistone, E. Riccomango, H. P. Wynn. Algebraic Statistics. CRC Press, 2001. Drton, Mathias, Sturmfels, Bernd, Sullivant, Seth. Lectures on Algebraic Statistics, Springer 2009.
Or more like:
Watanabe, Sumio. Algebraic Geometry and Statistical Learning Theory, Cambridge University Press 2009.
My understanding (I do not do AI or machine learning) that AI is distinct from these more mathematical analytic perspectives.
Finally, might we argue that generally AI/ML is more easily suited to data that's already high quality eg. CERN data, trade data, drug trial data as opposed to unconstrained data eg. Find the buses in these 1MM jpegs?
Structured Data like tables, time series etc the techniques are still from statistics. Regression for example is the workhorse for numerical prediction problems
I think a lot of people are missing the point about leaps AI has made because they aren't aware of NLP or CV or reinforcement learning.
So "AI" mentioned above is stunningly good for buses in 1MM image and reasonably good drug trial, cern data.
The business models required for making AI business successful haven't been invented yet.
Good AI model will be Deep stack : example would be something like precision agriculture where you'd use AI for designing rice then use iot and earth observation to locate right acreages and monitor growth and adjust nutrient at crop level and get dramatically great output with least wastage and highest nutritional content.
Most AI companies are still started by ex CS folks who in general arent aware of deep technical opportunities in other disciplines. I think this will change soon very fast due to ubiquity of deep learning training material, libraries and research papers.
This is a tautology in the narrow sense, but in the broader sense I think there surely exist things that humans don't "know" how to do without a computer, but know how to do with a computer. And the space of solveable problems is expanding, though AI is only a narrow slice of that.
goes on to say
>I agree, but the hockey stick required for VC backing, and the army of Ph.D.s required to make it work doesn’t really mix well with those limited domains, which have a limited market.
Choose one?
Also assumes running your own data center to be easy. Some people don't want to be up 24x7 monitoring their data center or to buy hardware to accommodate the rare 10 minute peaks in usage.
But is that really the use case here? I haven't worked in ML. But I'm not seeing where you are going to need to handle a 10 minute spike that requires a whole datacenter.
A month's worth of a quad gpu instance on AWS could pay for a server with similar capacity in a few months of usage.
And hardware is pretty resilient these days. Especially if you co-locate it in a datacenter that handles all the internet and power up time for you. And when something does go wrong, they offer "magic hands" service to go swap out hardware for you. Colocation is surprisingly cheap. As is leasing 'managed' equipment.
That’s why the author found it glaringly obvious that it should be brought in-house. It’s often both the most costly and most “in-housable” compute work involved in these companies.
Do you need that for training workloads, and what percentage of a startups workload is training?
Strong agreement from me: I've never worked on deploying ML models, but have worked on deploying operations-research type automated decision systems that have somewhat similar data requirements. Most of the work is client org specific in terms of setting up the human & machine processes to define a data pipeline to provide input and consume output of the clever little black box. A lot of this is super idiosyncratic & non repeatable between different client deployments.
And the input matters, a lot. So the differentiating factor isn't the models, it's the data and companies like Google figured it out a long time ago.
In short, find interesting problems, then the solutions -- not the other way around.
ML is a mining problem. Digitizers are the miners. Annotators are the refiners.
The models are likely also a differentiating factor in a sense that there are models that perform much better than others, to a point of completely new functionality. But also all of these models are basically open source currently... So they can't by definition be differentiating between different companies, because all of the companies generally have access to all of the algorithms. At leat to all of the types of algorithms.
If you need to solve gnarly industrial scale mixed integer combinatorial optimisation problems in the guts of your ML / optimisation engine, the commercial MIP solvers (gurobi , CPLEX ) or non-MIP based alternative combinatorial optimisation systems (localsolver ) can often give more optimal results in exponentially less running time than free open source alternatives.
1% more optimal solutions might translate into 1% more net profit for the entire org if you've gone whole hog and are trying to systematically profit optimise the entire business, so depending on the scale of the org it might be an easy business case to invest a few million dollars to set this system in place.
Annual server licenses for this commerical MIP solver software was 0(100k) / yr per server & the companies that build these products bake a lot of clever tricks from academia into these products that you can exploit by paying the license fee. ( my knowledge of pricing is out of date by about 7 years ) .
They deliver now often with backend cloud storage, update near continuously, integrate frequently with outside services, sometimes open source major components iteratively, typically have an evolving API and developer ecosystem to educate, and are sold as subscriptions. It’s not as “human in the loop” as some of the AI described in this article but it’s clearly moving toward services in terms of margins.
Nothing is like the old shrink wrapped software business, basically.
To me, the services you describe are software-as-a-service - they scale well without adding more humans to the mix. Services businesses, in contrast, generally need more humans to do more work.
I do think you are right that we are entering an age where the margin pressures will continue to increase. As the Amazon quote goes "your margin is my opportunity." In that world, strength accrues to the largest players - which is why AWS is so strong.
I like to joke that AWS should refund money to the startup that buy booths at re:Invent only to find out AWS is rolling out a competing service (with the acknowledgement that AWS entering a space doesn't necessarily mean the end of the competing company.)
Love to hear your thoughts on our library
> I’ll go out on a limb and assert that most of the up front data pipelining and organizational changes which allow for it are probably more valuable than the actual machine learning piece.
Especially at non-tech companies with outdated internal technology. I've consulted at one of these and the biggest wins from the project (I left before the whole thing finished unfortunately) were overall improvements to the internal data pipeline, such as standardization and consolidation of similar or identical data from different business units.
AI is like the new gold rush. And just like back then, it's not the gold diggers that will get rich.
"Most people in AI forget that the hardest part of building a new AI solution or product is not the AI or algorithms — it’s the data collection and labeling."
https://medium.com/startup-grind/fueling-the-ai-gold-rush-7a...
(from 2017)
It hasn’t been for a lack of trying. We’ve had everyone from IBM and Microsoft to small local AI startup try to sell us their magic, but no one has come up with anything meaningful to do with our data that our analysis department isn’t already doing without ML/AI. I guess we could replace some of our analysis department with ML/AI, but working with data is only part of what they do, explaining the data and helping our leadership make sound decisions is their primary function, and it’s kind of hard for ML/AI to do that (trust me).
What we have learned though, is that even though we have a truck load of data, we can’t actually use it unless we have someone on deck who actually understands it. IBM had a run at it, and they couldn’t get their algorithms to understand anything, not even when we tried to help them. I mean, they did come up with some basic models that their machine spotted/learned by itself by trawling through our data, but nothing we didn’t already have. Because even though we have a lot of data, the quality of it is absolute shite. Which is anecdotal, but it’s terrible because it was generated by thousand of human employees over 40 years, and even though I’m guessing, I doubt we’re unique in that aspect.
We’ll continue to do various proof of concepts and listen to what suppliers have to say, but I fully expect most of it to go the way Blockchain did which is where we never actually find a use for it.
With a gold rush, you kind of need the nuggets of gold to sell, and I’m just not seeing that with ML/AI. At least no yet.
Ultimately the value of selling tools is dependent on the riches being mined actually existing. The value of AI/big data to the average business has yet to be determined
A lot of those companies are styled as "AI" companies themselves, aiming to automate the process of labeling.
The main winner here really is Amazon. They get a chunk by serving up infrastructure and in labeling through mechanical turk.
Why don't they buy their own hardware for this part? The training process doesn't need to be auto-scalable or failure-resistant or distributed across the world. The value proposition of cloud hosting doesn't seem to make sense here. Surely at this price the answer isn't just "it's more convenient"?
Say you have $8M in funding, and you need to train a model to do x
You can either:
a) gain access to a system that scale ondemand and allows instant, actionable results.
b) hire a infrastructure person, someone to write a K8s deployment system. Another person to come in a throw that all away. Another person to negotiate and buy the hardware, and another to install it.
Option b is can be the cheapest in the long term, but it carries the most risk of failing before you've even trained a single model. It also costs time, and if speed to market is your thing, then you're shit out of luck.
ML distributed training is all about increasing training velocity and searching for good hyperparameters
I wrote it to be tongue-in-cheek in a ranting style, but essentially "AI" businesses and the technology underpinning it are not the silver bullet the media and marketing hype has made it out to be. The linked article about a16z shows how AI is the same story everywhere - enormous capital to get the data and engineers to automate, but even the "good" AI still gets it wrong much of the time, necessitating endless edge-cases, human intervention, and eventually it's a giant ball of poorly-understand and impossible to maintain pipelines that don't even provide a better result than a few humans with a spreadsheet.
There was this meme in the 70s about "self driving cars" following magnetic strips in the road in restricted highways. I remember at the time, being, like 8 and thinking "sure seems like an overly complicated train."
Your post was much better than mine, but I appreciate the comment.
Exactly wrong and contradicts most of the thesis of the article - that AI often fails to achieve acceptable models because of the individuality, finickiness, edge cases, and human involvement needed to process customer data sets.
The key to profitability is for AI to be a component in a proprietary software package, where the VENDOR studies, determines, and limits the data sets and PRESCRIBES this to the customer, choosing applications many customers agree upon. Edge cases and cat-guacamole situations are detected and ejected, and the AI forms a smaller, but critical efficiency enhancing component of a larger system.
Single-focus disruptors bad. Generic consultancy good - with ML secret sauce, possibly helped by hired specialist human insight.
Companies that can make this work will kill it. Companies that can't will be killed.
It's going to be IBM, Oracle, SAP, etc all over again. Within 10 years there will be a dominant monopolistic player in the ML space. It will be selling corporate ML-as-a-service, doing all of that hard data wrangling and model building etc and setting it up for clients as a packaged service using its own economies of scale and "top sales talent" (it says here).
That's where the big big big big money will be. Not in individual specialist "We ML'd your pizza order/pet food/music choices/bicycle route to work" startups.
Amazon, Google, MS, and maybe the twitching remnants of IBM will be fighting it out in this space. But it's possible they'll get their lunch money stolen by a hungry startup, perhaps in collaboration with someone like McKinsey, or an investment bank, or a quant house with ambitions.
5-10 years after that customisable industrial-grade ML will start trickling down to the personal level. But it will probably have been superseded by primitive AGI by then, which makes prediction difficult - especially about that future.
We're also a long way off from AGI. Nobody really even has a roadmap to what an AGI would look like. Heck, DNN/ML techniques have been widely-known since the early 90s; they just became practical with access to cloud-scale hardware, so the current situation has been 25+ years in the making.
I think they may just be crapping on them from a reasonable vantage point.
The next generation of Machine Learning is just emerging, and looks nothing like this. Funds are being raised, patents are being filed, and everything is in early stage development, so you probably haven't heard much yet - but these ML startups are going after real problems in industry: cross disciplinary applications leveraging the power of heuristic learning to make cross disciplinary designs and decisions currently still limited to the human domain.
I'm talking about the kind of heuristics which currently exist only as human intuition expressed most compactly as concept graphs and, especially, mathematical relationships - e.g. component design with stress and materials constraints, geologic model building, treatment recommendation from a corpus of patient data, etc. ML solutions for problems like these cannot be developed without an intimate understanding of the problem domain. This is a generalist's game. I predict that the most successful ML engineers of the next decade will be those with hard STEM backgrounds, MS and PhD level, who have transitioned to ML. [Un]Fortunately for us, the current buzzwordy types of ML services give the rest of us a bad name, but looking at these upcoming applications the answers to the article tl;dr look different:
>Deep learning costs a lot in compute, for marginal payoffs
The payoffs here are far greater. Designs are in the pipeline which augment industry roles - accelerate design by replacing finite methods with vastly quicker ML for unprecedented iteration. Produce meaningful suggestions during the development of 3D designs. Fetch related technical documents in real time by scanning the progressive design as the engineer works, parsing and probabilistically suggesting alternative paths to research progression. Think Bonzi Buddy on steroids...this is a place for recurring software licenses, not SaaS.
>Machine learning startups generally have no moat or meaningful special sauce
For solving specific, technical problems, neural network design requires a certain degree of intuition with respect to the flow of information through the network, which both optimizes and limits the kind of patterns that a given net can learn. Thus designing NN for hard-industry applications is predicated upon an intimate understanding of domain knowledge, and these highly specialized neural nets become patentable secret sauces. That's half of the most - the other comes from competition for the software developers with first-hand experience in these fields, or a general enough math heavy background to capture the relationships that are being distilled into nets.
>Machine learning startups are mostly services businesses, not software businesses
Again only true because most current applications are NLP adtechy bullshit. Imagine coding in an IDE powered by an AI (multiple interacting neural nets) which guides the structure of your code at a high level and flags bugs as you write. This, at a more practical level, is the type of software that will eventually change every technical discipline, and you can sell licenses!
>Machine learning will be most productive inside large organizations that have data and process inefficiencies
This next generation goes far past simply optimizing production lines or counting missed pennies or extracting a couple extra percent of value from analytics data. This style of applied ML operates at a deeper level of design which will change everything.
Citations needed. Large claims: presumably you can name one example of this, and hopefully it's not a company you work at.
I've seen projects on literally all the things you mention: materials science, medical stuff, geology/prospecting -none of them worked well enough to build a stand alone business around them. I do know the oil companies are using DL ideas with some small successes, but this only makes sense for them, as they've been working on inverse problems for decades. None of them buy canned software/services: it's all done in house. Probably always will be, same as their other imaging efforts.
Unfortunately this is all emerging just now and yes, I do work at such a company, but I'm old enough to not be naively excited by some hot fad. There's something profound just starting to happen but everyone is keeping the tech rather secret because it isn't developed/differentiated enough yet to keep a competitor from running off with an idea, yet. Disclosure is probably 1-3 years out of estimate.
>I do know the oil companies are using DL...as their other imaging efforts.
You're correct, and I happen to have experience in this domain - except there are a handful of up and commers courting funds from global majors like Shell and BP, and seismic inversion is near the end of the list of novel applications. Peteoleum is ground zero for a potential revolution right now, if we can come up with something before the U.S. administration clamps down on fossil fuels.
But we're talking complex algorithms which consist of multiple interacting neural networks. We are rapidly moving toward rudimentary reasoning systems which represent conceptual information encoded in vectors. I'm jaded enough that I wouldn't say we're developing AGI, but if the progressing ideas I'm familiar with and Workin on personally pan out, they will be massive baby steps towards something like AGI.
The space is evolving at least as rapidly as the academic side, which I think is an unprecedented pace of development for a novel field of study. I can't help but feel like these are the first steps towards some kind of singularity. There's no question that we are on to something civilization changing with neural networks, what remains to be seen is whether compute scaling will keep up with the needs of this next generation ML. Even if research stopped today, the modern ML zoo has exploded with architectures with fruitful applications across domains. The future is here!
If you're solving a real problem and use ML in service of solving that problem, then you've got a great moat....happy trusting customers.
It's not complicated
My way of saying, you're very, very right.
This is so right. Using a term "artificial intelligence" for machine learning is like using "artificial horses" to describe cars. It is even worse, since we cannot even define what "natural intelligence" actually is. Stop talking about "artificial intelligence".
https://www.louwmanmuseum.nl/ontdekken/ontdek-de-collectie/b...
>The bodywork represents a swan gliding through water. The rear is decorated with a lotus flower design finished in gold leaf, an ancient symbol for divine wisdom. Apart from the normal lights, there are electric bulbs in the swan’s eyes that glow eerily in the dark. The car has an exhaust-driven, eight-tone Gabriel horn that can be operated by means of a keyboard at the back of the car. A ship’s telegraph was used to issue commands to the driver. Brushes were fitted to sweep off the elephant dung collected by the tyres. The swan’s beak is linked to the engine’s cooling system and opens wide to allow the driver to spray steam to clear a passage in the streets. Whitewash could be dumped onto the road through a valve at the back of the car to make the swan appear even more lifelike.
>The car caused panic and chaos in the streets on its first outing and the police had to intervene.
AI is so shiny that makes people want to jump as fast as they can into that boat but a reasonable objective analysis shows that a huge and not insignificant amount of software problems can still be solved without relying on the "AI black box".
This is why I’m much more excited by AR and VR than AI. Human brains are fucking amazing at certain kinds of data processing and inference and pretty mediocre at others. We should be focusing more on creating interfaces and data visualizations that unlock that superpower for wider applications.
> Machine learning will be most productive inside large organizations that have data and process inefficiencies.
I strongly believe ML is at worst dangerous and at best pointless here. Data and Process inefficiencies => garbage in, garbage out. ML is NOT a silver bullet in large organisations that have these issues*, I've seen managers try to adopt ML to solve issues, but the results are almost always suspect and/or marginally better than simple if else rules but require a multiple people or teams to get all the data and models right.
I see a lot of 'bolt-on' tech emerging -- it looks mostly snake oil -- there is no obvious way to be competitive against teams that baked it in to the bare metal design
Also most commercial use-cases I've seen need effective ML more than anything else
This irrational sheep mentality amuses me. Yes, tehre are some very specific cases where AWS & ca. is clearly a better choice, but for the most cases I saw the TCO with hosting it on premises or renting servers is much lower, sometimes by an order of magnitude (in some cases even more). But people insist on doing it because others do it. We'll soon have an entire generation of engineers completely hooked on AWS & co. and not even realizing other solutions are possible, not to mention lower TCO.
Being able to sift/classify/analyse data with ML really can be a 'moat', an extreme competitive advantage. But using "AI" doesn't automatically get you there.
Separately, AWS is an expensive luxury, which is worth it if for some reason you can't manage your own computers.
I really annoys me when analysts like this guy mangle together things which are obvious and then comes up with an unsupported conclusion, like "second AI winter is coming man".
Cost wise though it's clearly being not knowledgeable about how it works or at least think all AI startups have huge training set. For many companies owning your hardware for training is a very easy step to rationalise cost.
It feels like an article written about all AI companies but actually (very) true only for some AI companies.
If we focused on writing more efficient software instead of demanding bigger and faster machines with more and more GPUs, would the cost of ML become more practical? More importantly, as the author pointed out, would smaller companies have a better chance at making advancements in the field?
Green AI (Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni - 2019)
Finally a correct use of "AI".
For a pure ML company to IPO they'd have to both solve intelligence and manufacture their own hardware. FOMO screwed a lot of investors who would've been better off buying Google stock.