Weird CPU architectures, the MOV only CPU (2020)

98 points by v9v 4 days ago

zyxzevn 7 hours ago

There was a CMOVE architecture around 1990 (Israel), I think. It was very similar. Could not find it on internet, sadly.

The MOVE architectures may work best with digital signal processors, because the data-flow is almost constant in such processors.

I invented my own version of the move only architecture (around 1992), but focused on speed. So here is my idea below.

1. The CPU only moves within the CPU, like from one register to the other. So all moves are extremely fast.

2. The CPU is separated in different units that can do work separately. Each unit has different input and output ports. The ports and registers are connected via a bus.

3. The CPU can have more buses and thus do more moves at the same time. If an output-data is not ready, the instruction will wait.

Example instruction: OUT1 -> IN1, OUT2 -> IN2 With 32 bits it would give give 8 units with 32 ports each.

Example of some set of units and ports. Control unit: (JUMP_to_address, CALL_to_address, RETURN_with_value, +conditionals) Memory unit: (STORE_Address, STORE_Value, READ_Address, READ_Value), Computation unit: (Start_Value, ADD_Value, SUB_Value, MUL_Value, DIV_Value, Result_Value) Value unit: (Value_from_next_instruction, ZERO, ONE) Register unit: (R0 ... R31)

It is extremely flexible. I also came up with a minimalist 8 bit version. One could even "plug-in" different units for different systems. Certain problems could be solved with adding special ports, which would work like a special instruction.

I did not continue the project due to people not understanding the bus architecture (like a PCI-bus). If you try to present it in a logical-gate architecture (like in the article), the units make the architecture more complicated than it actually is.

Joker_vD 2 hours ago

Sounds similar to TIS-100, but with even more special-purpose units.

Animats an hour ago

That's actually useful as a minimal machine.

It's possible to have a one instruction machine where the one instruction does a subtract, store, and branch if negative. But it's not very useful. This register-oriented thing is something someone might put inside an FPGA.

This is the the device register mindset, where you do everything by storing into device registers, as a CPU architecture.

gsliepen 11 hours ago

The Intel architecture is already Turing complete when you just use MOV instructions: https://github.com/xoreaxeaxeax/movfuscator. Of course, you don't even need instructions at all: https://news.ycombinator.com/item?id=5261598

mk_stjames 9 hours ago

I came back to reply with just this. Christopher Domas's conference talk on the movfuscator is legendary:
https://www.youtube.com/watch?v=R7EEoWg6Ekk
QuadmasterXLII 10 hours ago

While this is true, I suspect a spec compliant implementation of the x86 mov instruction would many use more transistors than OP’s entire CPU.
- crest 9 hours ago
  
  Of course, but you don't have the toy CPU under your desk or in your laptop running at several GHz nor are you likely to find it in a target that really needs a cute hack to obscure your exploit.
aleph_minus_one 2 hours ago

> The Intel architecture is already Turing complete when you just use MOV instructions
No physically existing architecture is Turing-complete, since every CPU can (by physics) only access a finite amount of memory, which means that its state space is finite, in opposite to the infinite state space of a Turing machine.
- jdiff 2 hours ago
  
  But that's not a very useful definition so we usually don't both enforcing that constraint.

noam_k 2 hours ago

I'm surprised the article doesn't mention OpenASIP [0], which not only helps you define the architecture, but also provides RTL synthesis and a working (if not always useful) compiler.

[0] http://openasip.org/

Lerc 3 hours ago

I have been playing around with my own design, initially inspired by the gigatron, but it seems to have diverged somewhat. ALU is the same, address unit enhanced, but a lot of the rest, program counter and instruction decode ending up completely different. Shuffling the Harvard architecture to be more like a instruction cache, only 16 bytes of instruction memory with long jumps triggering a full instruction memory load from RAM.

Going for transport triggered architecture for additional features seems like a fairly easy win. I kind of started designing one before I realised that's what the design was. The Gigatron has to do some unreasonably hard work for a few operations, like shift right, which is an operation that can fundamentally be done with just wires once you have a mechanism to provide the input and fetch the output.

Definitely not knocking the Gigatron though. Every limitation it has is because it saved a chip, when it comes something minimal to build upon It's pretty cool.

PaulHoule 9 hours ago

If you were interested in co-designing a CPU with software the TTA is an attractive way to do it, particularly in that it is easy to design it so you can do more than one MOV at the same time and thus have explicit parallelism.

The tough part though is that memory is usually slow and you have to wait an undetermined number of cycles for data to get back from DRAM and while one operation is blocked all the other operations are blocked.

I guess you could have something like this with a fancy memory controller that could be programming explicitly to start fetching ahead of time so data is available when it is needed, at least most of the time.

mrob 8 hours ago

How would you handle context switching? You've got a whole lot of exposed state scattered throughout the whole CPU.
- PaulHoule 6 hours ago
  
  By not doing it. The ideology here is that general purpose computing took numerous wrong turns from the 1950s to the present for the purpose of embedded systems.
  I thought this through back when I was doing embedded projects with the AVR-8, namely display controllers for persistence of vision displays. Something like this doesn't have an OS so you don't need to do context switching for the purposes of the OS.
  It was practical to write C code for this but I didn't really like it because code like this doesn't need the stack and the affordances that C calling conventions, the data structures needed to display a scene are dynamic with the scope of the scene, you have 32 registers which is a lot, enough that you can allocate 8 for the interrupt handler and have a lot left over for the main loop.
  I was wargaming my paths forward if I needed more power: the obvious route which I probably would have taken is the portable C route via ARM or STM32. Yet I liked AVR-8 a lot and also considered the route of going to an FPGA board on which you could instantiate an AVR-8 soft core clocked higher than any real hardware AVR-8 and also put an accelerator behind it.
  The FPGA + TTA + co-designed software route came up at this point. Notably any kind of concurrency, parallelism and extra context can be baked into the "hardware". Adding a few registers is much cheaper than adding superscalar features, adding another MOV slot to the instructions then is pretty cheap if you want more parallelism with the caveat that it could be hard to prevent blocking. If the requirements change it's a frickin' FPGA and you can add something to it or take something away.
  What would put the whole idea on wheels is a superoptimizing compiler that could design both the CPU and the code that runs on top of it.
- cmrdporcupine 4 hours ago
  
  I would just have multiple cores, and communication between them happens over a central shared hub, like the Parallax Propeller MCUs. If you want concurrency, push your new task onto a separate core.
  Still the problem is writing a compiler for such a system would suck.
cmrdporcupine 8 hours ago

I played around making a TTA-ish thing as part of learning Verilog some years ago. It's a neat idea: https://github.com/rdaum/simple_tta
- PaulHoule 7 hours ago
  
  Exactly, it's easier than developing a CPU the normal way and offers the possibility of making something that has unique capabilities as opposed to the mostly boring option of revisiting the Z-80 [1] or the near certainty of getting bogged down trying to implement a modern high performance CPU and getting pipelining, superscalar and all that to work.
  [1] with the caveat that extending that kind of chip to support a larger address space and simple memory protection is interesting to me

spicybright 8 hours ago

I've always loved quirky CPU designs like this, and having one layed out in logic gates is amazing.

I'm having trouble running the file though, it's missing a chip, "74181.dig". Can you point me to where to download that or add it to the repo?

v9v 7 hours ago

I'm not the author of the post itself, but the 74181 chip seems to be defined in the simulator: https://github.com/hneemann/Digital/blob/master/src/main/dig...
- spicybright 6 hours ago
  
  Unsure why the program wasn't picking up the chips. I just moved the lib folder from the Digital repo into the MOVputer folder and it worked. Thanks!

BertoldVdb 3 hours ago

This architecture is good for data path applications, but not really for control flow (eg, think how expensive a context switch would be)

drob518 2 hours ago

Yea, at best this is useful for deep embedded applications where you need a tiny bit of programmability and where implementation size in gates and cost is at a premium. It’s something you can stuff into a programmable logic device of some sort that you already have in the design. Otherwise, it’s interesting from an academic perspective in terms of studying minimalist computing architectures, but otherwise not practical.

psychoslave 10 hours ago

Looks like an interesting read, thank you @v9v.

Just when my night was going through a meditative sleep about basing ontological models using change as fundamental block. Identity is such a brittle choice as foundation, even if it's a great tool in many situations otherwise.

pyinstallwoes 9 hours ago

Many ancient cultures use behavior as identity. It certainly has a charm.
- lioeters 9 hours ago
  
  Was it the Navajos whose language doesn't have nouns, only verbs? A noun is a kind of illusion of eternal identity. A chair is only chair-ing for the moment as a configuration of matter that was doing something else before, and will fall apart and transform into doing something else in the future.
  - bradrn 8 hours ago
    
    IIRC Navajo has a pretty robust noun-verb distinction. However there definitely are other languages where nouns and verbs behave very similarly, e.g. most famously Salishan languages. That said, there don’t seem to be any natural languages in which nouns and verbs are completely indistinguishable — there’s always some minor difference in how they behave.
    
    psychoslave 7 hours ago
    
    I don't think noun as a grammatical class is an issue, all the more if we take for granted that grammar themselves are mere inferences modeling what's happening on average when producing some utterance, or at least a very simplified representation of a leaned version of the utterance.
    It might become more problematic when using a term such as substantive which can connotate some ontological beliefs about the nature of the word or what it refers to, or their relationship.
    English is already very generous with conversion of word type without morphological impact in general. I heard Mandarin don't have even that kind of string lexical typology bound to every item in the vocabulary, but I didn't check the details to be transparent.
    Thanks anyway for the hints on the other languages

neuroelectron 4 hours ago

Seems to me this would entirely eliminate many classes of exploits.

Bratmon 3 hours ago

Why would that be?