Latest issue 23 Jan 2026

4004 RISC Goals

First why RISC-4? Well why not? Ok so actually though I do realize that RISC-v(5) is named for it being the fifth major iteration of the RISC isa. But there is no RISC-4 or RISC-IV, I didn’t want to use numerals because of the possibility of confusion with an older isa or confusion that this isa is intended to be at all functional for modern computing tasks because it’s not.

Well so why then?

Well RISC-4 is intended to be an educational endeavor into what would have happened had silicon designers of the earliest days of the microprocessor would have maybe done things if they had RISC directives but were limited to 4 bit. I realize that this implementation would use far more transistors than they had the ability to layout at the time and they probably would have wanted a larger width than 4 over going RISC but this is my world and you’re just observing the tinkerings of a mad man so there. RISC-4 is just that, a 4 bit RISC architecture with absolutely zero practical use in the modern computing world. But we’re all gonna die anyway so why be practical?

Here be the plan

We’ll get into the nitty gritty soon but for now an overview, where did this idea come from, where will it go, and why I’m writing like this is a conversation.

This idea came from my dream of holding a piece of silicon I designed in my hand. I thought that it would never be possible. Then I stumbled into open source silicon! After a lot of reading and then some how convincing myself I’m actually maybe smart enough to start this I landed on Skywater. Skywater is a small fab that has open sourced their entire 130 nm node! How crazy is that? You can implement a 15mm^2 integrated circuit of your very own! It lives in a breakout shuttle that allows for standard packaging, power delivery, and i/o. But how cool is that! So I decided because of my admiration towards early microprocessors and the designers behind them I would implement the 4004 in modern silicon. That would be cool and maybe I could give one to Federico Faggin! So this all started as an idea to do that. Then I realized how absolutely minuscule a 10um chip is when shrunk down to 130nm, especially one with only ~2700 transistors! It’s like putting a single wide on 10,000 acres of land. (I just came up with that one) so then I started looking into the whole mcs-4 family, even maxing it out with 16 ram and 16 rom chips I still have 99.9% of the field available! So I decided to see “what would Faggin have done if he had more space?” So I then thought about and am designing a pipelined version! But then I had an idea on my flight, could I run C? The answer with the 4004 is a resounding No. The architecture is simply not there, Harvard architecture instead of Von Neumann, minuscule stack depth(understandable he design this for a calculator) so I started thinking, RISC-V seems neat. And here we are!
Alright so we made it here(hopefully) and the next question that is surely in your mind, where do we go with this? Well first I need to design the ISA. Unsurprisingly RISC-V being a competent and modern isa doesn’t even think about 4 bit nibbles. Why would they? 32 bit words can store 4 bits just fine. So we have to shrink this puppy down… a lot. That is what the rest of this document and series will be, how to stuff as much RISC-V into a 4 bit core as we can. Why? See above, because I can(maybe)
So here is what you are actually here for. Why does this read like it does? Simple! I don’t know how to write any other way! I’m not a technical writer, I’m not someone who can write seriously, the info in here will be good but you’ll have to trudge through my words to get there!

Registers!

What is a register? Well first I’ll ask you, how did you wind up on this side of the internet? It’s not a bad thing at all! If you don’t know you learn by bashing your head against the wall until the knowledge sticks(just me?) so a register is basically a little bucket in the cpu that holds a number! 64 bit, 32 bit, 16, 8, 4( that’s us! Maybe)

So a register in your computer can likely hold one 64 bit number, that is pretty standard nowadays thanks to AMD, all my homies hate itanium(iykyk). So 64 bits, that’s a big number right? Unsigned we get 18,446,744,073,709,551,615. That’s pretty big right?! Eighteen quadrillion, little fun fact it 1 second was actually a millisecond it would take over five hundred and seventy one million years to reach that number. So it’s pretty big. Now you must be thinking “well 32 bit must be pretty massive too right? Well wrong! In the grand scheme of things it is tiny! 2^32 is only 4,294,967,296. Pretty small in comparison huh? So then 16 bits is a paltry 65,536 and our teeny tiny 2^4? 16… the largest number we can represent in 4 bits is 16. Now Jeremy what can we do with 4 bits then? Jack shit! Well not really because we can use register pairs or quads and that’s what we’re gonna do!

So here’s the plan

RISC-V uses 32 bit registers which is great, with that you can pack in so much data! But that’s not very 1971 of them now is it? And here’s the shocker, I’m gonna cheat too! I’m going to use 16 bit registers. That is a nice number we can use to hold addresses, instructions, all kinds of things! We limit the alu and the minimum data size to 4 bit nibbles just like Faggin! He implemented 4 bit registers but being cisc he also made it so that you could pair registers giving the programmer 8 bits to work with, and even some 8 bit instructions! So yeah, 16 bit registers allows us to make an actual “useful” core here.

But Why?

Because I want to run C! I want to have a 4 bit target for C floating around that is only useful on my bodged together core that shouldn’t exist.

Constraints

The major one is 4 bit data. But there are more! Here is the non exhaustive list of constraints I’ve put on myself and how RISC-V deals with them. From there we can figure out how to get it working(hopefully)

immediates

What’s an immediate? An immediate is a way to embed data directly into the opcode. Instead of ADD A,B; A=100, B=200

We can do

ADD A, 200

So we don’t have to waste a cycle or three going over to the register and pulling the value in register B. That’s pretty handy right? Well in 32 bit it absolutely is! We get so much room for activities including storing pretty big numbers! 16 bit allows some of those activities but as the numbers increase we lose space! So things will be slower, we’ll need to read registers more often because we simply can’t fit bigger immediates.

OP Codes!

This is what you’ve been waiting for right? You sick freak. Well welcome to the club.

So op codes, what are they and why do we care? Op codes are for out little human brains to be able to understand what the fuck even is going on with the processor. That’s pretty cool right? Well the 32 bit isa has tons of room to describe all the needed operations, 8 bits for op codes, tons of room right? 16 bits we can really only allocate 4 bits for op codes. Remember that 16 number? Yeah that’s all the space we have. How do we get around that? I have no idea, I haven’t gotten there yet! This post is laying out what we’ll dive into later!

Addressing

What to say? We need to know where things are right? Addresses are how we do that. We won’t have any of the fancy mmu or paging… yet, maybe that’s another project for later.

So! Addresses, have a 4 bit address is useless, even back in 1971 it was useless, 16 addresses? What are we gonna do with that?! So the original 4004 had a 12 bit address bus(multiplexed) since we want it to be easy let’s go with 16 bits. That gives us plenty of space!

Bandwidth, it’s gonna be a problem here, and that’s ok. Seeing as we are 4 bit bound we can only fetch 4 bits per clock right? Then if we need to grab a full register that’s gonna cost us 4 cycles! Painfully slow, but faithful to the project. So it stays! I can’t imagine anyone willingly trying to write performant code for this thing anyway.

Math?

Math, it’s hard. This may or may not surprise by how I sound but I was always shit at math. I did what I needed to in order to pass but that’s it, that’s why god(Von Neumann) graced us with the idea of the arithmetic logic unit. It can do all kinds of fancy figuring. The 4004 could only add and subtract, pretty good right? But we can do better, I’m talking add, subtract, multiply, divide, hell let’s throw in floats too! But you might be saying “why floats with only 4 bits?” I’ll kindly ask you to refer back to the first section on why.

The compiler and my life of crime

So the compiler starts to diverge here right? We have an issue, C expects 8 bit chars right? We don’t have 8 bit chars in this house. Here is the delima and one that I haven’t decided on yet. Do we go 2 nibbles per char and break the 4 bit convention for the sake of C compliance? Or do we say fuck the establishment and do 4 bit chars? I can tell you what I’m leaning towards but we run into encoding issues if we want to you, write letters to the screen. So that is to be determined.

Conclusion

This is a project that shouldn’t exist, there’s a reason no one has done this. But here I am, standing on the shoulders of Dunning-Kruger staring into the void and its talking back. I won’t tell you what it’s saying. It’s being very rude.

Subscribe to King Applied Research