I wanted to learn about serverless computing and so I built Unofunction, a Python Lambda that wraps LibreOffice’s headless mode and can convert any document format that LibreOffice can import to any document format that LibreOffice can export (e.g. DOCX to PDF).
*Implementation*
Calling LibreOffice in a Python Lambda required compiling LibreOffice for the Amazon Linux 2 base image (CentOS 7). I originally tried compiling LibreOffice locally (using Docker) on my 2021 M1 MacBook Pro (10 CPU cores) but gave up after a few hours. On an AWS EC2 c6i.8xlarge instance (32 vCPUs, $1.36 per hour in eu-west-2), compiling took ~30 minutes. In case someone else might need a prebuilt headless LibreOffice that has been compiled for Amazon Linux 2, I’ve uploaded the resulting image to https://hub.docker.com/repository/docker/unofunction/libreof...
A crucial implementation detail is that because AWS Lambda does not permit files to be written, except to the /tmp directory, LibreOffice needs to be called with the argument '-env:UserInstallation=file:///tmp/’. Otherwise, it will attempt user installation in .config/libreoffice and fail.
*Thoughts*
I enjoyed using the AWS CDK a lot. There is something magical about being able to deploy infrastructure by manipulating it as if it were a Python or a TypeScript object. I preferred it to Terraform’s declarative approach.[1] My only criticisms are that deployment can be extremely slow and that destroying S3 buckets can be clunky (they need to be emptied first).
Testing code deployed to AWS is hard. Neither AWS SAM nor LocalStack perfectly simulates a real deployment: code which works locally does not always work on Lambda. (An example of this is deploying files to the /tmp directory. SAM allows such files to be edited or overwritten; Lambda makes them read-only.)
[1] This article is a good comparison between Terraform and the AWS CDK: https://medium.com/swlh/cdk-or-terraform-88a464bedf9e