A Minimal TTL Processor for Architecture Exploration (1994) (opens in new tab)

(bradrodriguez.com)

59 pointspetrohi6y ago19 comments

19 comments

12 comments · 3 top-level

fizixer6y ago· 5 in thread

A similar approach to demonstrate a GPU would be great. Any recommendations?

Well, let me rephrase. A GPU these days has two distinct features, graphics-processing, and GPGPU. I'm less interested in the graphics part (since that pipeline could be studied in software, and in hardware, it's very specialized/ASICy).

So I'm really interested in the massively-parallel GPGPU aspect of a GPU.

msla6y ago

As long as we're wishing...

These kinds of projects always take you up to where CPUs were in the early 1950s on Large Systems or the 1970s in the home computing world: Single-issue processors with no memory protection or privilege levels. They work, in that you can write useful software for those systems, but taken as a way to explain how CPUs work in a holistic fashion they fall well short. They simply aren't complex enough to explain why Meltdown happens, for example: Since there is no concept of privilege to begin with, you can't use them to explain privilege level violations. More prosaically, you can't explain a "cold cache" when the processor doesn't have a cache which can be cold.

This is demoralizing for the poor sods who think they're going to learn how CPUs work and end up with a CPU design which is decades out of date and no way to extend it to even a thirty-year-old design. "You can't get there from here" is the bane of tutorials which explain the basics and then stop.

jacobush6y ago

Still, understanding any general purpose CPU gets you most of the way. Virtual memory and memory protection isn't terribly far off, could be implemented using paging. A very simple feature complete system could be made. A very basic model, yet working. A union between theory and practice-

Heartbleed is because of memory and cache shenanigans. More like how things can get wrong if you optimize too hard. While important, it feels like another line of thinking.

tyingq6y ago

That seems a bit harsh. You have to start somewhere, and this teaches basic stuff like an ALU, Clock, Accumulator, etc. Some homebrew CPUs have memory bank switching.

Moving to an FPGA based CPU might be the next step. There are soft CPUs with cache, MMU, etc. (https://github.com/SpinalHDL/VexRiscv for example)

2 more replies

sgtnoodle6y ago

I am speaking out of ignorance, but my understanding is that modern graphics pipelines consist of software "shader" programs running on the GPGPU hardware. The GPU makers aren't including vast numbers of compute cores just as a bonus feature, they're how the graphics part works. Each little core runs the same shader program to render its set of pixels, reading and writing out to different offsets in shared memory buffers. The "general purpose" use basically boils down to writing a shader program that does useful math rather than draw pretty pictures.

namibj6y ago

There is fixed hardware for 5 main purposes.

1. Output framebuffer.

2. Polygon rasterization (often limited to points and triangles).

3. Texture sampling. This is accessible in CUDA (I have ~zero experience with other GPGPU systems).

4. Afaik also for blending. This might have stopped now.

5. Video codecs (MPEG-2, H.263, H.264, H.265, VP8, VP9, soon AV1) decoding, and also encoding for some of them.

Nvidia RTX also include ray tracing hardware that handles that task more efficienly (I presume by using fixed logic for dispatching memory/cache-aware computations like e.g. content-addressable memory and such).

Most things are handled by the shader cores. They are 1024bit SIMD with lane-masking until Volta, and a more flexible/arbitrary fork/join since Turing (not all Turing has the ray tracing hardware), which also brought a scalar execution port with it (like amd64 getting traditional RAX/RDX/etc. with their opcodes after only having AVX instructions). AMD GCN afaik has a quite explicit SIMD architecture, with a scalar execution port since inception. Also 1024bit iirc.

1 more reply

artemonster6y ago· 2 in thread

I remember first time visiting this page when I was 11yrs old and I couldn‘t understand a thing. Revisited then multiple times and it contributed greatly to my understanding of CPUs and my first logisim self-designed CPU was heavily based on this :)

bear86426y ago

Could you explain the register file section please? - Don't understand that.

Thanks

Gracana6y ago

> Eight 74172s provide eight 16-bit registers in a three-port register file. This file may simultaneously write one register ("A"), read a second ("B"), and read or write a third ("C").

It really does do all that with just the 74172s. The 74172 is a register file containing 8x 2-bit words, with multiple ports for reading and writing, which are split up into a couple of sections.

Section 1 has independent read and write ports. The write port consists of data input DA[1..0], address AA[2..0] and write enable ~WEA. If ~WEA is low, data is written from DA to the register selected by AA on the positive edge of the clock. The read port consists of data output QB[1..0], address AB[2..0], and read enable ~REB. When ~REB is low, the contents of the register selected by AB are output on QB.

Section 2 has another set of read and write ports, but this time with a common address. Read port is DC[1..0], write port is QC[1..0], address is AC[2..0], and read and write enables are ~REC and ~WEC.

In the PISC, there are eight of these chips with all their control lines tied together, so you get a single 8x16 register file with all of the features described above.

> In a single clock cycle, the following occurs:

> a) one register is output to the Address bus and the ALU's A input;

...using section 1's read port.

> b1) another register may be output to the Data bus and the ALU's B input; or

...using section 2's read port.

> b2) data from memory may be input to another register;

...using section's 2 write port.

> c) an ALU function is applied to A (and perhaps B) and the result is stored in the first (address) register.

...using section 1's write port.

1 more reply

fjfaase6y ago· 2 in thread

This reminds me of the Gigatron, a even more minimalistic processor. https://gigatron.io/

tyingq6y ago

On the other end of the complexity scale, a 6502 made with TTL chips: https://c74project.com/

Or the Magic1. Similar total count of 74x chips (~200), but he ported Minix2 to it. http://www.homebrewcpu.com/

fjfaase6y ago

I guess that the 6502 has about four times as many NAND-gates as the Gigatron. I understand that some people are working on emulator for 6502 code for the Gigatron. The Gigatron is a bit (extreme) risc processor. It has no micro code and the instruction decoding consists of a matrix a simple diode matrix. This results in many instructions that perform the same operation or no operation at all. The Gigatron runs a program that generates the VGA signal (with reduced resolution) and an emulator for a 16-bits CIS processor. The actual programs executed by the Gigatron are written for the emulator.

j / k navigate · click thread line to collapse

19 comments

12 comments · 3 top-level

fizixer6y ago· 5 in thread

A similar approach to demonstrate a GPU would be great. Any recommendations?

So I'm really interested in the massively-parallel GPGPU aspect of a GPU.

msla6y ago

As long as we're wishing...

jacobush6y ago

Heartbleed is because of memory and cache shenanigans. More like how things can get wrong if you optimize too hard. While important, it feels like another line of thinking.

tyingq6y ago

That seems a bit harsh. You have to start somewhere, and this teaches basic stuff like an ALU, Clock, Accumulator, etc. Some homebrew CPUs have memory bank switching.

Moving to an FPGA based CPU might be the next step. There are soft CPUs with cache, MMU, etc. (https://github.com/SpinalHDL/VexRiscv for example)

2 more replies

sgtnoodle6y ago

namibj6y ago

There is fixed hardware for 5 main purposes.

1. Output framebuffer.

2. Polygon rasterization (often limited to points and triangles).

3. Texture sampling. This is accessible in CUDA (I have ~zero experience with other GPGPU systems).

4. Afaik also for blending. This might have stopped now.

5. Video codecs (MPEG-2, H.263, H.264, H.265, VP8, VP9, soon AV1) decoding, and also encoding for some of them.

1 more reply

artemonster6y ago· 2 in thread

bear86426y ago

Could you explain the register file section please? - Don't understand that.

Thanks

Gracana6y ago

> Eight 74172s provide eight 16-bit registers in a three-port register file. This file may simultaneously write one register ("A"), read a second ("B"), and read or write a third ("C").

It really does do all that with just the 74172s. The 74172 is a register file containing 8x 2-bit words, with multiple ports for reading and writing, which are split up into a couple of sections.

In the PISC, there are eight of these chips with all their control lines tied together, so you get a single 8x16 register file with all of the features described above.

> In a single clock cycle, the following occurs:

> a) one register is output to the Address bus and the ALU's A input;

...using section 1's read port.

> b1) another register may be output to the Data bus and the ALU's B input; or

...using section 2's read port.

> b2) data from memory may be input to another register;

...using section's 2 write port.

> c) an ALU function is applied to A (and perhaps B) and the result is stored in the first (address) register.

...using section 1's write port.

1 more reply

fjfaase6y ago· 2 in thread

This reminds me of the Gigatron, a even more minimalistic processor. https://gigatron.io/

tyingq6y ago

On the other end of the complexity scale, a 6502 made with TTL chips: https://c74project.com/

Or the Magic1. Similar total count of 74x chips (~200), but he ported Minix2 to it. http://www.homebrewcpu.com/

fjfaase6y ago

j / k navigate · click thread line to collapse