OO machines: Rekursiv

Mon Feb 10 21:16:15 UTC 1992

[I am resending this because the original message was delivered to
only part of the mailing list. ]

The Rekursiv architecture happened to be discussed on comp.arch recently,
and the following message might be of interest to you.  The thread started
with a "CISC vs RISC" debate (Nth edition :-) and someone mentioned the
Rekursiv's microcoded GC as an example where CISC design payed off.

-Urs

====================== Begin included message =====================
  In article <hoelzle.697314394 at Xenon.Stanford.EDU> hoelzle at Xenon.Stanford.EDU (Urs Hoelzle) writes:
  >gor at cs.strath.ac.uk (Gordon Russell) writes:
  >
  >>Well....look at the garbage collection performance of the REKURSIV machine.
  >>This collection mechanism, which was implemented in microcode, ran many
  >>times faster than an equally powerful collector written in assembly language.
  >
  >Please excuse my scepticism, but this claim sounds suspicious.  It is
  >hard to believe that the Rekursiv garbage collector would run "many

  This _is_ a suspicious claim. The "equally powerful collector" is where I have
  a problem with the original article. Here's why:

  The Rekursiv was NOT a fully fledged computer, it was a processor, it relied
  on the host platform to provide access to the disk and the "outside world".
  The Rekursiv worked with objects and these were given unique identfiers.
  The mapping from identifier to DRAM address (for the object body) was performed
  through the PAGER TABLE. This was direct mapped and 64K elements deep.
  Now when a new object was created it would be allocated an new object
  identifier, and this would be given a slot in the pager table. Here's the crux:
  if that slot in the pager table had previously been consumed by an object,
  the old object would automatically be flushed to disk.

  When the INTERNAL garbage collector ran, either automatically (when DRAM 
  was exhausted and some needed to be freed) or when the "garbageCollect" 
  primitive was invoked, it only cleaned up the DRAM IN THE REKURSIV.
  I could post the micro-code if there is enough interest; this is why
  the line containing NOPGRFETCH is significant - when an object body was
  scanned, the paging to bring in objects ("automatically") was switched off.

  In other words newly created objects which had been flushed to 
  disk would not be brought in to be scanned (they had  already made it 
  to the "persistent-store"; garbage objects on disk would be cleared 
  up off-line.)

  The garbage collector was a simple mark-scan collector - there was some
  further trickery involved to handle the following circumstance:

  suppose object A contains the only reference to object B and both are NEW.
  Object A is flushed to disk. Given the above scheme B would be discarded
  because only the objects in the DRAM are scanned. To prevent B being
  (incorrectly) discarded, the Konntroller (yes, with a K) would fiddle
  the pager table when A was flushed. The garbage collector then treated B
  as a root.

  Anyone not convinced by all of this should try the following on a REKURSIV:

  create a 100 objects with any relationship to one another;
    time the garbage collect

  create a 10000 objects with any relationship to one another;
    time the garbage collect

  create a 100000 objects with any relationship to one another;
    time the garbage collect

  create a 1000000 objects with any relationship to one another;
    time the garbage collect

  You'll notice that the _garbage_collect_time_ never exceeds about half a second
  - this should tell you something.. (Notice that creating the objects is a
  different matter.) Some care is necessary because the Garbage Collector
  runs automatically; the run-time statistics will tell you how many GC's
  have been performed.

  >times faster" than a *good* (generational) software garbage collector
  >on a general-purpose RISC.  I suspect that it ran faster than the
  >*same* algorithm coded in assembly, but that no high-performance
  >software GC system would use this algorithm because better algortithms
  >exist (and these might not be implementable as easily in microcode).

  On a more telling note, I re-implemented Lingo, the language designed for
  the Rekursiv, on a conventional platform (a sun 3/60) using techniques
  found in the Smalltalk and Self Dynamic Translators (hello Urs we meet at 
  OOPSLA90). Although I didn't put in every optimization, this
  *software_solution* was THREE times faster than the Rekursiv.

  How can this be ? Simple, unless you have LOTS (100M (dollars)) developing
  a custom processor is silly - the REKURSIV was implemented in 2 micron
  ASIC technology, and the processor spanned FOUR chips. Shrinking that
  onto a single custom chip would have helped, perhaps we could have clocked it
  at 40MHz instead of 10MHz, but the cost would have been outrageous.

  >RISC is about engineering tradeoffs: is the added complexity worth the
  >performance gain? 

  No. No. and thrice no. The complexity slowed the machine down, and having
  to go off-chip for micro-code limited the clocked speed. For those of you
  who are curious, the control word was 128bits, and ~4K lines of code were
  in the control-store (for Lingo). Getting 64K of control store onto a
  "single chip solution" would not possible even today.

  Further, the designers of the Rekursiv failed to understand optimization.
  This lack of engineering discipline and hard metrics led to an over
  specified (and mostly unused) machine. For instance, the Control Stack
  was hardly ever used, because there were virtually no recursive micro-code
  calls. However, the control stack consumed several registers, a chunk of
  control store and was 32K X 24Bits of 40ns SRAM.

  Understanding the hardware is only half the story though. Anyone designing
  a processor must also understand the software that is going to run on it;
  having a sound understanding of what modern compilers are capable of is
  essential. The mismatch between what Lingo needed to run efficiently, and
  the Rekursiv, was wider than any "semantic-gap" you care to mention.

  [Incidently, I didn't have anything to do with the design of the REKURSIV.
  It was already shipping when I got involved with Linn.]

  >>I could give you a reference if you like (try the Systems Sciences conference
  >>, Hawaii, Jan 1992, P. Cockshott). 
  >
  >Yes, please.  Is there a PostScript version available somewhere?

  I'd like to see this too. I bet he missed the NOPGRFETCH. If he quoted
  performance figures someone should post them..

  Andrew