undefined | Better HN

0 pointsykonstant1y ago0 comments

>I wish we would have more discussions on how to improve Lisp even further instead of trying to 'fix' C or C++ for the umpteenth time.

I agree one million percent; projects like SBCL are great, but my impression is that there are tons of improvements to be had in producing optimized code for modern processors (cache friendliness, SIMD, etc), GPU programming etc. I asked about efforts in those directions here and there, but did not get very clear answers.

0 comments

6 comments · 1 top-level

throw23532651y ago· 5 in thread

I don’t know much about Common Lisp, but one of the times I evaluated it I wondered why it fairs so poorly in benchmarks[1], and as a complete noob I went and checked what sort of code it will produce for something completely trivial, like adding 2 fixnums. And oh my god:

  * (defun fx-add (x y)
      (declare (optimize (speed 3) (safety 0) (debug 0))
               (type fixnum x y))
      (+ x y))
  FX-ADD
  * (disassemble #'fx-add)
  ; disassembly for FX-ADD
  ; Size: 104 bytes. Origin: #x7005970068                       ; FX-ADD
  ; 68:       40FD4193         ASR NL0, R0, #1
  ; 6C:       00048B8B         ADD NL0, NL0, R1, ASR #1
  ; 70:       0A0000AB         ADDS R0, NL0, NL0
  ; 74:       E7010054         BVC L1
  ; 78:       BD2A00B9         STR WNULL, [THREAD, #40]         ; pseudo-atomic-bits
  ; 7C:       A97A47A9         LDP TMP, LR, [THREAD, #112]      ; mixed-tlab.{free-pointer, end-addr}
  ; 80:       2A410091         ADD R0, TMP, #16
  ; 84:       5F011EEB         CMP R0, LR
  ; 88:       C8010054         BHI L2
  ; 8C:       AA3A00F9         STR R0, [THREAD, #112]           ; mixed-tlab
  ; 90: L0:   2A3D0091         ADD R0, TMP, #15
  ; 94:       3E2280D2         MOVZ LR, #273
  ; 98:       3E0100A9         STP LR, NL0, [TMP]
  ; 9C:       BF3A03D5         DMB ISHST
  ; A0:       BF2A00B9         STR WZR, [THREAD, #40]           ; pseudo-atomic-bits
  ; A4:       BE2E40B9         LDR WLR, [THREAD, #44]           ; pseudo-atomic-bits
  ; A8:       5E0000B4         CBZ LR, L1
  ; AC:       200120D4         BRK #9                           ; Pending interrupt trap
  ; B0: L1:   FB031AAA         MOV CSP, CFP
  ; B4:       5A7B40A9         LDP CFP, LR, [CFP]
  ; B8:       BF0300F1         CMP NULL, #0
  ; BC:       C0035FD6         RET
  ; C0: L2:   090280D2         MOVZ TMP, #16
  ; C4:       2AFCFF58         LDR R0, #x7005970048             ; SB-VM::ALLOC-TRAMP
  ; C8:       40013FD6         BLR R0
  ; CC:       F1FFFF17         B L0
  NIL

Are you serious? This should be 1, max 2 instructions, with no branches and no memory use.

Furthermore, I’ve also decided to evaluate the debuggers available for Common Lisp. However, despite it being touted as a debugger-oriented language, I think the actual debuggers are pretty subpar, compared to debuggers available for C, C++, Java or .NET. No Common Lisp debugger supports watchpoints of any kind. If a given debugger supports breakpoints at all, they’re often done through wrapping code in code that triggers a breakpoint, or making this code run under interpreter instead of being native. Setting breakpoints in arbitrary code won’t work, it needs to be available as source code first. SBCL with SLIME doesn’t have a nice GUI where I could use the standard F[N] keys to step, continue, stop, etc. I don’t see any pane with live disassembly view. No live watch. LispWorks GUI on the other hand looks like a space station, where I struggle to orient myself. The only feature that is somewhat well-done is live code reload, but IMO it’s something far less important than well-implemented breakpoints and watchpoints in other languages, since the main thing I need the debugger for is to figure out what the hell a given piece of code is doing. Editing it is a completely secondary concern. And live code reload is also not unique to Common Lisp.

Debugger-wise, Java and .NET seem to be leading in quality, followed by C and C++.

[1]: Yes, I have read many comments about the alleged good performance of Common Lisp, but either authors of these comments live in a parallel reality with completely different benchmark results, or they’re comparing to Python. As such I treat those comments as urban legends.

ska801y ago

Skill issue ;)

  * (defun fx-add (x y)
        (declare (optimize (speed 3) (safety 0) (debug 0))
                 (type fixnum x y))
        (the fixnum (+ x y)))
  FX-ADD
  * (disassemble 'fx-add)
  ; disassembly for FX-ADD
  ; Size: 6 bytes. Origin: #x552C81A6                           ; FX-ADD
  ; 6:       4801FA           ADD RDX, RDI
  ; 9:       C9               LEAVE
  ; A:       F8               CLC
  ; B:       C3               RET
  NIL

ArtixFox1y ago

what does (the ....) do?

2 more replies

ArtixFox1y ago

Im sorry i similar things on my sbcl, and i cannot see the output you are seeing.

CL-USER> (defun fx-add (x y)

(declare (optimize (speed 3) (safety 0) (debug 0))

         (type fixnum x y))

(+ x y))

[OUT]: FX-ADD

CL-USER> (disassemble #'fx-add)

; disassembly for FX-ADD

; Size: 27 bytes. Origin: #x55498736 ; FX- ADD

; 36: 48D1FA SAR RDX, 1

; 39: 48D1FF SAR RDI, 1

; 3C: 4801FA ADD RDX, RDI

; 3F: 48D1E2 SHL RDX, 1

; 42: 710A JNO L0

; 44: 48D1DA RCR RDX, 1

; 47: FF1425E8070050 CALL [#x500007E8] ; #x54602570: ALLOC-SIGNED-BIGNUM-IN-RDX

; 4E: L0: C9 LEAVE

; 4F: F8 CLC

; 50: C3 RET

[OUT]: NIL

``` it looks inefficient but i think it can still be optimized better. SBCL is mediocre at best at optimizations but most people do say that. I have not heard anyone calling it as fast as or faster than C/C++. I think sbcl being mediocre has more to do with how the compiler and its optimizations are structured rather than some inherent inefficiency of common lisp.

I agree with you that lisps have a terrible UX problem. But that is slowly changing. Alive for VSCode is a nice tool if you want to try it out. I personally use neovim for my CL development. fits me well enough.

The biggest reason why someone might choose CL for performance would be the abiliy to simply hack in the compiler as it is running to make custom optimization passes, memory allocations and structuring, very minute GC and allocator control, custom instructions and ability to tell sbcl how to use them WITHOUT RELOADING SBCL.

Those are just the basics in my opinion. Performance has nothing to do with the language...except ig the start times? Beyond that its a game of memory management[which, im pretty sure you can hack too in CL]

all this low level power while still being able to write soo much abstraction that you can slap a whole ass language in CL like APL!

lispm1y ago

> SBCL is mediocre at best at optimizations but most people do say that.

I would think that it takes a bit more knowledge to judge that.

You'll need to understand a bit more how to declare types to achieve better results.

1 more reply

lispm1y ago

> Are you serious? This should be 1, max 2 instructions, with no branches and no memory use.

Sure. ---> One could add two fixnums and get a bignum.

    CL-USER 31 > (fixnump (+ MOST-POSITIVE-FIXNUM MOST-POSITIVE-FIXNUM))
    NIL

As you see here, adding two fixnums can have a result which is not a fixnum.

Yes, Common Lisp does by default determine whether to return a bignum or fixnum.

The machine code you've shown takes care of that.

Let's see what the SBCL file compiler says:

    CL-USER> (compile-file "/tmp/test.lisp")
    ; compiling file "/tmp/test.lisp" (written 11 JUL 2024 10:02:04 PM):

    ; file: /tmp/test.lisp
    ; in: DEFUN FX-ADD
    ;     (DEFUN FX-ADD (X Y)
    ;       (DECLARE (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
    ;                (TYPE FIXNUM X Y))
    ;       (+ X Y))
    ; --> SB-IMPL::%DEFUN SB-IMPL::%DEFUN SB-INT:NAMED-LAMBDA 
    ; ==>
    ;   #'(SB-INT:NAMED-LAMBDA FX-ADD
    ;         (X Y)
    ;       (DECLARE (SB-C::TOP-LEVEL-FORM))
    ;       (DECLARE (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
    ;                (TYPE FIXNUM X Y))
    ;       (BLOCK FX-ADD (+ X Y)))
    ; 
    ; note: doing signed word to integer coercion (cost 20) to "<return value>"
    ; 
    ; compilation unit finished
    ;   printed 1 note


    ; wrote /tmp/test.fasl
    ; compilation finished in 0:00:00.031
    #P"/private/tmp/test.fasl"
    NIL
    NIL

Well, the SBCL compiler does tell us that it can't optimize that. Isn't that nice?!

If you want a fixnum to fixnum addition, then you need to tell the compiler that the result should be a fixnum.

   (the fixnum (+ x y))

Adding above and now the compiler no longer gives that efficiency hint:

    CL-USER> (compile-file "/tmp/test.lisp")
    ; compiling file "/tmp/test.lisp" (written 11 JUL 2024 10:03:11 PM):

    ; wrote /tmp/test.fasl
    ; compilation finished in 0:00:00.037
    #P"/private/tmp/test.fasl"
    NIL
    NIL

j / k navigate · click thread line to collapse

0 comments

6 comments · 1 top-level

throw23532651y ago· 5 in thread

  * (defun fx-add (x y)
      (declare (optimize (speed 3) (safety 0) (debug 0))
               (type fixnum x y))
      (+ x y))
  FX-ADD
  * (disassemble #'fx-add)
  ; disassembly for FX-ADD
  ; Size: 104 bytes. Origin: #x7005970068                       ; FX-ADD
  ; 68:       40FD4193         ASR NL0, R0, #1
  ; 6C:       00048B8B         ADD NL0, NL0, R1, ASR #1
  ; 70:       0A0000AB         ADDS R0, NL0, NL0
  ; 74:       E7010054         BVC L1
  ; 78:       BD2A00B9         STR WNULL, [THREAD, #40]         ; pseudo-atomic-bits
  ; 7C:       A97A47A9         LDP TMP, LR, [THREAD, #112]      ; mixed-tlab.{free-pointer, end-addr}
  ; 80:       2A410091         ADD R0, TMP, #16
  ; 84:       5F011EEB         CMP R0, LR
  ; 88:       C8010054         BHI L2
  ; 8C:       AA3A00F9         STR R0, [THREAD, #112]           ; mixed-tlab
  ; 90: L0:   2A3D0091         ADD R0, TMP, #15
  ; 94:       3E2280D2         MOVZ LR, #273
  ; 98:       3E0100A9         STP LR, NL0, [TMP]
  ; 9C:       BF3A03D5         DMB ISHST
  ; A0:       BF2A00B9         STR WZR, [THREAD, #40]           ; pseudo-atomic-bits
  ; A4:       BE2E40B9         LDR WLR, [THREAD, #44]           ; pseudo-atomic-bits
  ; A8:       5E0000B4         CBZ LR, L1
  ; AC:       200120D4         BRK #9                           ; Pending interrupt trap
  ; B0: L1:   FB031AAA         MOV CSP, CFP
  ; B4:       5A7B40A9         LDP CFP, LR, [CFP]
  ; B8:       BF0300F1         CMP NULL, #0
  ; BC:       C0035FD6         RET
  ; C0: L2:   090280D2         MOVZ TMP, #16
  ; C4:       2AFCFF58         LDR R0, #x7005970048             ; SB-VM::ALLOC-TRAMP
  ; C8:       40013FD6         BLR R0
  ; CC:       F1FFFF17         B L0
  NIL

Are you serious? This should be 1, max 2 instructions, with no branches and no memory use.

Debugger-wise, Java and .NET seem to be leading in quality, followed by C and C++.

ska801y ago

Skill issue ;)

  * (defun fx-add (x y)
        (declare (optimize (speed 3) (safety 0) (debug 0))
                 (type fixnum x y))
        (the fixnum (+ x y)))
  FX-ADD
  * (disassemble 'fx-add)
  ; disassembly for FX-ADD
  ; Size: 6 bytes. Origin: #x552C81A6                           ; FX-ADD
  ; 6:       4801FA           ADD RDX, RDI
  ; 9:       C9               LEAVE
  ; A:       F8               CLC
  ; B:       C3               RET
  NIL

ArtixFox1y ago

what does (the ....) do?

2 more replies

ArtixFox1y ago

Im sorry i similar things on my sbcl, and i cannot see the output you are seeing.

CL-USER> (defun fx-add (x y)

(declare (optimize (speed 3) (safety 0) (debug 0))

         (type fixnum x y))

(+ x y))

[OUT]: FX-ADD

CL-USER> (disassemble #'fx-add)

; disassembly for FX-ADD

; Size: 27 bytes. Origin: #x55498736 ; FX- ADD

; 36: 48D1FA SAR RDX, 1

; 39: 48D1FF SAR RDI, 1

; 3C: 4801FA ADD RDX, RDI

; 3F: 48D1E2 SHL RDX, 1

; 42: 710A JNO L0

; 44: 48D1DA RCR RDX, 1

; 47: FF1425E8070050 CALL [#x500007E8] ; #x54602570: ALLOC-SIGNED-BIGNUM-IN-RDX

; 4E: L0: C9 LEAVE

; 4F: F8 CLC

; 50: C3 RET

[OUT]: NIL

all this low level power while still being able to write soo much abstraction that you can slap a whole ass language in CL like APL!

lispm1y ago

> SBCL is mediocre at best at optimizations but most people do say that.

I would think that it takes a bit more knowledge to judge that.

You'll need to understand a bit more how to declare types to achieve better results.

1 more reply

lispm1y ago

> Are you serious? This should be 1, max 2 instructions, with no branches and no memory use.

Sure. ---> One could add two fixnums and get a bignum.

    CL-USER 31 > (fixnump (+ MOST-POSITIVE-FIXNUM MOST-POSITIVE-FIXNUM))
    NIL

As you see here, adding two fixnums can have a result which is not a fixnum.

Yes, Common Lisp does by default determine whether to return a bignum or fixnum.

The machine code you've shown takes care of that.

Let's see what the SBCL file compiler says:

    CL-USER> (compile-file "/tmp/test.lisp")
    ; compiling file "/tmp/test.lisp" (written 11 JUL 2024 10:02:04 PM):

    ; file: /tmp/test.lisp
    ; in: DEFUN FX-ADD
    ;     (DEFUN FX-ADD (X Y)
    ;       (DECLARE (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
    ;                (TYPE FIXNUM X Y))
    ;       (+ X Y))
    ; --> SB-IMPL::%DEFUN SB-IMPL::%DEFUN SB-INT:NAMED-LAMBDA 
    ; ==>
    ;   #'(SB-INT:NAMED-LAMBDA FX-ADD
    ;         (X Y)
    ;       (DECLARE (SB-C::TOP-LEVEL-FORM))
    ;       (DECLARE (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
    ;                (TYPE FIXNUM X Y))
    ;       (BLOCK FX-ADD (+ X Y)))
    ; 
    ; note: doing signed word to integer coercion (cost 20) to "<return value>"
    ; 
    ; compilation unit finished
    ;   printed 1 note


    ; wrote /tmp/test.fasl
    ; compilation finished in 0:00:00.031
    #P"/private/tmp/test.fasl"
    NIL
    NIL

Well, the SBCL compiler does tell us that it can't optimize that. Isn't that nice?!

If you want a fixnum to fixnum addition, then you need to tell the compiler that the result should be a fixnum.

   (the fixnum (+ x y))

Adding above and now the compiler no longer gives that efficiency hint:

    CL-USER> (compile-file "/tmp/test.lisp")
    ; compiling file "/tmp/test.lisp" (written 11 JUL 2024 10:03:11 PM):

    ; wrote /tmp/test.fasl
    ; compilation finished in 0:00:00.037
    #P"/private/tmp/test.fasl"
    NIL
    NIL

j / k navigate · click thread line to collapse