With "This", it's obvious the title is "(This code) smells of desperation". The submitted title is ambiguous; it could mean "(Code smells) of desperation".
"How to" doesn't get edited out. Certain other leading hows do.
As a noun, it can describe specific ways that hypothetical code might not follow best practices. For instance, code fragments that have been copy-pasted many times rather than refactored into a function, is a code smell. The use of many global variables is a code smell. Together, these are "code smells".
As a verb, it describes specific code which exhibits these sorts of attributes. A particular source file can smell. The code... smells. The phrase can also be used adjectively, to say that code is smelly.
The title "Code smells of desperation" could imply the noun form, in which an article discusses various code smells which could be a general indication that a hypothetical code base might be in desperate shape. Or that the team maintaining it is. It is an article about smells, in code.
Whereas, "this code smells of desperation" uses the verb form to indicate that the article is about a particular code base which appears to be in desperate shape, because of the smells it specifically gives off. It is an article about code, which smells.
The post author (Michal Necasek) states, about the WIN87EM.DLL code:
> It bears all the hallmarks of code that was written, rewritten, rewritten again, hacked, tweaked, modified, and eventually beaten into submission even if the author(s) had no real idea why it finally worked.
From what I gather, here are those hallmarks:
- Looping a no-op action, presumably to slow things down.
- Unnecessarily performing actions multiple times. This happens for three things: (a) writing a zero to an I/O port to clear something; (b) executing an instruction to clear exceptions; and (c) repeating the aforementioned no-op loop at different points.
- Saving a status in a separate location, only to reinstate it to its original location after clearing things out.
- Communicating procedure state (an EOI, “end of interrupt”) to one entity (the master interrupt controller) but not another (the controller’s slave). Furthermore, this “end” signal was sent near the beginning of the procedure. (This final point is my own observation and not explicitly called out by the author. Perhaps it’s common and not “smelly” for interrupt handlers to do this up front.)
I’ve tried to reframe the technical terms as actions and signals in a way that could be recognizable to devs of higher-level systems. My familiarity with OS-level systems is minimal so my interpretations could be a little wrong.
But despite my lack of knowledge, and with the author’s help, it does seem clear that there were serious timing and state related bugs here. And as a dev at other levels of the stack, I can relate: it’s very hard to reason about async global state! And this code’s responsibility was handling math errors, not timing errors. It is - or, perhaps, should be - the responsibility of the OS to orchestrate these things appropriately so that math libraries can focus on math stuff.
So my takeaways, for “code smells of desperation”, would be:
- There are violations of module responsibility.
- There are modifications of process timing with no discernible reason.
- There are modifications of status/environment/state with no discernible reason.
- And finally, other experts (in this case, the post author) can’t make sense of the code.
AutoCAD did all manner of nasty things with floating point numbers in order to stash extra data into them. Denormals, NaNs and the like were painfully common. You had to make sure your trap handlers were fast or AutoCAD performance would suck and everybody would slag your computer.
AutoCAD was one of the banes of existence for the FX!32 guys.
I can imagine somebody spent months on those few lines of assembly.
If you just needed a delay, this is bad code thats just been randomly iterated until it 'works'.
On the other hand, if the hardware does require such an incantation then it's impressive that someone managed to wade through the brokenness.
I'm inclined to believe it's the former though.
The first time wasting code is long because it has to be slower than the slowest 287 instruction takes to complete after signaling an error. The other time wasters are shorter because they come after known instructions that are faster (FNSTSW just stores 2 bytes to memory, FNCLEX clears some bits inside the 287). Note also that they are the FNSTSW and FNCLEX -- that means there is no implicit (F)WAIT instruction before the real 287 instruction.
Why two FNCLEX? I don't know.
Why 4 writes to port F0? Probably in case the FNSTSW and FNCLEX instructions lead to errors.
There is a behavior on some CPUs where "out 0xf0" can leave IGNNE# active, but you can clear it after the "out" by running "fnclex".
Why are there two of them? Either the "out 0xf0" is affected by IGNNE# being active, or maybe the original draft had one "spin, out, fnclex" and that whole block of code was just copy+pasted when they added the second one.
(As an aside, why are we assuming 80287 and not 8087? I know nothing about both, so it’s well likely that I missed obvious hints. EDIT: Ah, I guess because it’s the int 13 handler specifically.)
The mention of not using the wait instruction reminded me of this other post on the same site: https://www.os2museum.com/wp/learn-something-old-every-day-p...
So I'm not surprised the exception handler is a mess. It's a domain built entirely out of corner-cases.
[1] https://old.reddit.com/r/retrobattlestations/comments/hj12ck...
[2] https://micro.magnet.fsu.edu/optics/olympusmicd/galleries/ch...
I remember a tank game called Scorched Earth where you would have to set angle & power to try to hit the other person's tank. Some ordinances took a 10-15 seconds to fire & complete because it was running FP ops on the CPU. Once the 387 was installed, this calculation was done almost instantly. That's about all I remember my FPU being good for. LOL good times!
Me and my siblings had a house rule to not use either when playing on our 286 because it took a minute or so to complete...
This is how the characters in Coding Machines realized something was up, assembly instructions involving carry bits that made no sense, that they later realized was how an AI writes code: https://www.teamten.com/lawrence/writings/coding-machines/
> It took us the rest of the afternoon to pick through the convoluted jump targets and decode four consecutive instructions. That snippet, it turns out, was finding the sign of an integer. Anyone else would have done a simple comparison and a jump to set the output register to -1, 0, or 1, but the four instructions were a mess of instructions that all either set the carry bit as a side-effect, or used it in an unorthodox way.
One of the results was a bizarre circuit that wasn't really digital anymore, because the pieces were arranged to exploit ways in which the digital circuit was imperfect, forming a system that was actually analog and idiosyncratic to the test environment.
[0] https://www.damninteresting.com/on-the-origin-of-circuits/
The relevant code comments seems to be
"Fix timing problem??"
and
"486 bug - must wait till after last "out f0" to clear fp exceptions or IGNNE# will be permanently active."
public __fpIRQ13
__fpIRQ13:
cli
WASTE_TIME 70
push ax
xor al, al
NULL_JMP
out 0f0h, al ; reset busy line.
NULL_JMP
mov al, 65h
NULL_JMP
out 0a0h, al ; EOI slave irq 5
NULL_JMP
mov al, 62h
NULL_JMP
out 20h, al ; EOI master irq 2
NULL_JMP
pop ax
sub sp, 2
push bp
mov bp, sp
fnstsw [bp+2]
WASTE_TIME
push ax
xor al, al
NULL_JMP
out 0f0h, al ; reset busy line.
NULL_JMP
pop ax
pop bp
; fnclex ; 486 bug - must wait till after last
; "out f0" to clear fp exceptions
; or IGNNE# will be permanently active.
WASTE_TIME
push ax
xor al, al
NULL_JMP
out 0f0h, al ; reset busy line.
NULL_JMP
pop ax
; fnclex ; 486 bug - must wait till after last
; "out f0" to clear fp exceptions
; or IGNNE# will be permanently active.
WASTE_TIME
push ax
xor al, al
NULL_JMP
out 0f0h, al ; reset busy line.
NULL_JMP
pop ax
fnclex ;Now this is safe.
WASTE_TIME 70 ;Fix timing problem??
jmp __FPEXCEPTION87P"If an unmasked exception occurs when the numeric exception bit in CR0 is clear and the IGNNE# pin is active, the performance of the FPU will be retarded as long as the exception remains pending."
https://www.cs.earlham.edu/~dusko/cs63/prepentium.html
I wonder if that has anything to do with it all.
People didn’t have code to copy and paste — so they randomly wrote it like monkeys until it worked based their understanding of one page of a manual, which was literally the only documentation or description anywhere of how the system they were working with worked.
Source: I was there :)
I am not proud of my desperation, but I can acknowledge it now.
Anybody recall if there was a notable performance difference between Borland's FP emulation lib and M$, then? My habit at the time was to religiously avoid all floats, to the point of shipping a home made arbitrary precision BCD math library. It was no faster than anything else but it gave the same results for the same inputs, every time on every machine.
The code is awful, but, really, if anyone's to blame, it's the Linux people who never cared to systematize and unify system's understanding and representation of storage.