Is there any attempt to directly train on file bytes? Make the only vocab of LLM as base-2, base-8 or hexadecimal, then do next token prediction on this.
I know some attempts have been done like MEGABYTE and Charformer but some may have is not directly learning from bytes with all the header info
I found several compilers related:
0. GCC
1. Zig started as LLVM frontend, finally with self-hosting, can do cross-platform codegen without LLVM.
Last time Zig managed to be self-hosting and codegen for various platforms is possible without LLVM.
How does Python class system actually compare to Lisp CLOS?
I've seen arguments for Python class system is the blocker for important code optimizations, AOT or JIT. Are there elaborate explanations on why we get near-machine code speed for compiled SBCL Lisp but we cannot even save the image for PyPy runs? Seems like a problem worse than GIL.