I started this project because I was tired of BMP libraries that hide allocations, pull half of libc, or solve only one small part of the problem. There are many libbmp-style repos, but the functionality feels scattered: one does decoding, another does encoding, another works on embedded, another is single-header. In real projects this often means depending on 5–10 small libraries just to load and display images. I wanted one predictable library instead: no allocations, one header, full control over buffers, and usable both on desktop and on microcontrollers. At some point I got so into it that I added a streaming embedded decoder and kept everything stb-style. The result is TurboLibBMP. I would really appreciate feedback on the API design, edge cases, and whether this approach makes sense in real projects.
Repository: https://github.com/Ferki-git-creator/turbo-lib-bmp
BPU (Batch Processing Unit) is a lightweight embedded scheduling core focused on keeping output pipelines stable under pressure (UART backpressure, limited bandwidth, bursty producers).
Instead of blocking or growing unbounded queues, it: enforces per-tick byte budgets, coalesces redundant events, degrades gracefully under sustained load, exposes detailed runtime statistics.
The repository includes design notes, flow diagrams, and real execution logs, which makes the runtime behavior very transparent.
Repo: https://github.com/choihimchan/bpu_v2_9b_r1
I’ve been working on an ESP-IDF backend for it, and reading through the docs gave me a lot of ideas about observability and backpressure handling in small systems.
Curious what others think about this approach.