Show HN: Faast.js – Serverless Batch Computing Made Simple (opens in new tab)

(faastjs.org)

202 pointsachou7y ago31 comments

31 comments

27 comments · 9 top-level

gregmac7y ago· 5 in thread

From what I can tell, it's the invocation model and deployment that is unique here?

You invoke faast from your local machine (or build server, or cron job, whatever), and in turn it deploys some functions to a serverless platform and runs them, then tears them all down when complete. Eg, from the site, this code runs locally:

    import { faast } from "faastjs";
    import * as funcs from "./functions";

    (async () => {
        const m = await faast("aws", funcs);
        try {
            // m.functions.hello: string => Promise<string>
            const result = await m.functions.hello("world");
            console.log(result);
        } finally {
            await m.cleanup();
        }
    })();

You wouldn't want to run this code on serverless, as you'd be paying for compute time of just waiting for all the other tasks to complete.

It would be useful to see a discussion about how and where to host this entry code, may even a topic on "Running in production".

It's definitely a neat idea because if you control the event that kicks everything off anyway (eg: "create monthly invoices" or "build daily reports") you can deploy the latest version of everything, run it and clean it up in essentially a single step.

(Please correct me if I've misunderstood any of the details here!)

achouOP7y ago

You're basically correct, and thanks for the suggestion to add documentation about deployment in production.

One special case is if your functions return a lot of data; outbound data charges can get expensive fast, and you'll be limited in getting responses by your network link. So you can run the coordinator code on, say, EC2 in the same region and then the link to Lambda is super fast and you won't have any outbound data costs.

penagwin7y ago

This is how I interpreted it's usage too. We've all started an instance on DO/AWS/GCP/ETC for some batch job were we wanted 32 cores or whatnot. This lets you use lambda's for the scaling instead of the cores directly. How efficient this is performance wise I have no clue.

teej7y ago

To serve as a data point, I effectively built an in-house version of this a few years ago built on top of AWS Lambda, all in Python. The "entry point" code or orchestration code was hosted normally on an EC2 instance. More specifically we were using Airflow, so our Airflow server would kick off a Python program that would then orchestrate a couple thousand Lambdas.

netofeverythin37y ago

Very cool. Worth checking out a similar project Durable Functions. However those orchestrations can run in serverless and can scale to zero during the “waiting for other tasks” step.

https://docs.microsoft.com/en-us/azure/azure-functions/durab...

Disclaimer - product manager for Azure durable functions

achouOP7y ago

I'd love to add Azure support but I'm not super familiar with it. Would be great to chat about it sometime.

1 more reply

achouOP7y ago· 4 in thread

Hi everyone, faast.js is a library that allows you to use serverless to run batch processing jobs. It makes it super easy to run regular functions as serverless functions. This is one of my first open source projects and I'd be happy to answer any questions here.

m00dy7y ago

Hi,

Do you think serverless pricing model contradicts with batch operations ? I think that you normally pay for duration of tasks and ram usage etc. Batch jobs are supposed to be running long. I'm probably missing something here. Would you tell me little bit more ?

achouOP7y ago

It depends on the specific use case. Some of the use cases I envision have sharp spikes in demand, and serverless can provide better service and price/performance. Part of faast.js is a cost analyzer that can tell you in real time how much your workload costs. What I found is that most people are probably using the wrong memory sizes for their lambda functions to optimize for price/performance. More on that when I write my next blog post... If you want a preview, check out this chart from the documentation: https://faastjs.org/docs/cost-estimates

CoreFailure7y ago

This looks great! I have an upcoming processing-intesive project I hope to test this out on soon.

I especially like the cost estimate feature, that isn't something I've seen in such a seemingly simple tool like this before.

pushtheenvelope7y ago

this looks super neat!

I've been wanting to make a graphql server framework that can run on lambdas, and will perhaps look into integrating with faast.

mring336217y ago· 3 in thread

This is neat, but would be more useful if it could deploy cloud functions made in language {x} and provide local js proxies for them.

achouOP7y ago

Good idea. Any specific example you have in mind?

linuxdude3147y ago

Python is a good place to start.

1 more reply

mring336217y ago

Honestly, I would want java. Probably would have to provide a mapping spec file (like IDL) to help generate the mediation code between the local proxies and the deployed functions.

dongxu7y ago· 2 in thread

Very interesting project, the problem with Serverless service provided by different public cloud vendors is that programming and API are not uniform. I think Faast.js is on the right path to creating a unified interface for different Serverless services.

bdcravens7y ago

Doesn't Serverless (the framework, not the concept) abstract this away?

https://serverless.com/framework/docs/providers/

(not familiar enough with that framework to form an opinion one way or the other)

zaq_xsw7y ago

I'm not experienced on this stuff either, but it seems like Serverless (the org/framework) is designed for architecting whole sites/apps, whereas faast.js is focussed on hhandling batch computing jobs.

BrandiATMuhkuh7y ago· 2 in thread

Love what you did!

We resently were exactly in a situation where we had to do heavy processing of ~4000 items each running between 1-10minutes. To speed the process up we ran it on lambda. That means our process went down from 10h++ on a single core computer to about 15min running it on 4000 lambdas.

Your library would have saved us quite some work as it would take away a lot of Aws config, deploy, etc....

Btw: I'm thinking of building a similar library for multi core/webworkers for node.js. currently a lot of boilerplate is required on node.js to make a loop run parallel on all cores.

achouOP7y ago

Very cool. What kind of data was it, if you don't mind sharing?

Faast.js can be used with multi-core, just use the "local" mode and run it on a large box. I'm billing this as a way to test locally before running in the cloud, but it's actually a completely viable way to run parallel processes on one machine, with the option to run on serverless with a one line change.

BrandiATMuhkuh7y ago

Wow that's awesome. I'll have a look at it ASAP. We have actually just converted our lambda code to run on a multi core machine + much wiser algorithms to massively speed up the process.

I have not deeply look into your library yet. But how do you deal with de/serialising? We use https://www.npmjs.com/package/class-transformer to correctly de/serialise ts-objects.

Also, do you create a new webworker per function call or do you create only as many workers as threads/cores on the machine and run the functions inside those? Starting a webworker can be very expensive if the serialised data is large .

Ps: each lambda function ran a special parsing of complex mathematics-excercises. We are an ed-tech company ;)

1 more reply

asadlionpk7y ago· 1 in thread

This can be great for scrapping jobs!

There are IP-based rate limiters on sites (linkedIn, facebook, etc), but each lambda has a new public IP so by using faast.js, I can stay under the radar.

Plus you can essentially spawn a headless chrome (puppeteer) to do advanced stuff.

achouOP7y ago

Indeed, I've put together a simple example of using puppeteer with faast.js in this repo: https://github.com/faastjs/examples/tree/master/aws-puppetee...

sourc37y ago· 1 in thread

This is very neat! Last year I had to essentially do this on GCP and relied on a very similar implementation. Everyone was surprised to see JS being used for data processing but it worked wonderfully.

One thing I want to ask is the retries, how do you handle that currently? I ran into multiple cases where functions would fail for transient reasons.

achouOP7y ago

Functions need to be idempotent, so you have to assume they will be retried. Faast.js will proactively do retries in some cases where it thinks a function is slow, to reduce tail latency.

If a function fails to execute for transient reasons and exceeds the retry maximum (a config setting you can change), then it will reject the return value promise. You can catch that and handle with another attempt, or report an error, or just ignore it and report less accurate or complete results.

heathermiller7y ago

Reminds me a bit of like 2019's version of RMI...

dead_mall7y ago

Looks interesting. The concept reminds me of RPyC

j / k navigate · click thread line to collapse

31 comments

27 comments · 9 top-level

gregmac7y ago· 5 in thread

From what I can tell, it's the invocation model and deployment that is unique here?

    import { faast } from "faastjs";
    import * as funcs from "./functions";

    (async () => {
        const m = await faast("aws", funcs);
        try {
            // m.functions.hello: string => Promise<string>
            const result = await m.functions.hello("world");
            console.log(result);
        } finally {
            await m.cleanup();
        }
    })();

You wouldn't want to run this code on serverless, as you'd be paying for compute time of just waiting for all the other tasks to complete.

It would be useful to see a discussion about how and where to host this entry code, may even a topic on "Running in production".

(Please correct me if I've misunderstood any of the details here!)

achouOP7y ago

You're basically correct, and thanks for the suggestion to add documentation about deployment in production.

penagwin7y ago

teej7y ago

netofeverythin37y ago

Very cool. Worth checking out a similar project Durable Functions. However those orchestrations can run in serverless and can scale to zero during the “waiting for other tasks” step.

https://docs.microsoft.com/en-us/azure/azure-functions/durab...

Disclaimer - product manager for Azure durable functions

achouOP7y ago

I'd love to add Azure support but I'm not super familiar with it. Would be great to chat about it sometime.

1 more reply

achouOP7y ago· 4 in thread

m00dy7y ago

Hi,

achouOP7y ago

CoreFailure7y ago

This looks great! I have an upcoming processing-intesive project I hope to test this out on soon.

I especially like the cost estimate feature, that isn't something I've seen in such a seemingly simple tool like this before.

pushtheenvelope7y ago

this looks super neat!

I've been wanting to make a graphql server framework that can run on lambdas, and will perhaps look into integrating with faast.

mring336217y ago· 3 in thread

This is neat, but would be more useful if it could deploy cloud functions made in language {x} and provide local js proxies for them.

achouOP7y ago

Good idea. Any specific example you have in mind?

linuxdude3147y ago

Python is a good place to start.

1 more reply

mring336217y ago

Honestly, I would want java. Probably would have to provide a mapping spec file (like IDL) to help generate the mediation code between the local proxies and the deployed functions.

dongxu7y ago· 2 in thread

bdcravens7y ago

Doesn't Serverless (the framework, not the concept) abstract this away?

https://serverless.com/framework/docs/providers/

(not familiar enough with that framework to form an opinion one way or the other)

zaq_xsw7y ago

BrandiATMuhkuh7y ago· 2 in thread

Love what you did!

Your library would have saved us quite some work as it would take away a lot of Aws config, deploy, etc....

Btw: I'm thinking of building a similar library for multi core/webworkers for node.js. currently a lot of boilerplate is required on node.js to make a loop run parallel on all cores.

achouOP7y ago

Very cool. What kind of data was it, if you don't mind sharing?

BrandiATMuhkuh7y ago

Wow that's awesome. I'll have a look at it ASAP. We have actually just converted our lambda code to run on a multi core machine + much wiser algorithms to massively speed up the process.

I have not deeply look into your library yet. But how do you deal with de/serialising? We use https://www.npmjs.com/package/class-transformer to correctly de/serialise ts-objects.

Ps: each lambda function ran a special parsing of complex mathematics-excercises. We are an ed-tech company ;)

1 more reply

asadlionpk7y ago· 1 in thread

This can be great for scrapping jobs!

There are IP-based rate limiters on sites (linkedIn, facebook, etc), but each lambda has a new public IP so by using faast.js, I can stay under the radar.

Plus you can essentially spawn a headless chrome (puppeteer) to do advanced stuff.

achouOP7y ago

Indeed, I've put together a simple example of using puppeteer with faast.js in this repo: https://github.com/faastjs/examples/tree/master/aws-puppetee...

sourc37y ago· 1 in thread

One thing I want to ask is the retries, how do you handle that currently? I ran into multiple cases where functions would fail for transient reasons.

achouOP7y ago

Functions need to be idempotent, so you have to assume they will be retried. Faast.js will proactively do retries in some cases where it thinks a function is slow, to reduce tail latency.

heathermiller7y ago

Reminds me a bit of like 2019's version of RMI...

dead_mall7y ago

Looks interesting. The concept reminds me of RPyC

j / k navigate · click thread line to collapse