[self-interest] Re: self in self - plan B

Wed Dec 15 19:55:32 UTC 1999

In a message with this subject on November 22, I mentioned some alternatives
for the code generation part of the Self compiler:

> We could:
> 
>   - write a program that would read a description of the CPU and
>     generate a code generator for it Self automatically. This would
>     be a kind of "mango for the back end".
> or
>   - write the back end in a very abstract style, suitable for any
>     CPU. Then we would create a child object for a specific CPU
>     and implement the abstract methods ('generateJump:') manually.
>     The inlining compiler would optimize most of the implied
>     indirections, so the final code would be similar to that of
>     the first option (I am talking about how fast the code
>     generator runs, not the quality of the code it generates)
> or
>   - write the back end to generate a very primitive and abstract
>     machine code. A patttern matcher in the final pass would
>     translate that into the actual machine language for each CPU

I commented that the first two options (gcc-style) would generate the fastest
code generator, but the third option would allow the best scheduling (which is
critical for high performance in modern processors). I would recomend that
anyone interested in scheduling look at the various papers by Steve Novack
(http://www.ics.uci.edu/~snovack/) and also this paper by Urs Hölzle and his
students: http://www.cs.ucsb.edu/oocsb/papers/dispatch.html

Just a little background: when David Ungar and Randy Smith first defined the
Self language, most people thought that it would be impossible to make it
perform as well as Smalltalk since they hadn't compromised the language model
to make it more efficient. Well, Craig Chambers and friends proved them wrong
by inventing sophisticated type inference systems. They got 50% of C speed on
highly numerical code (the kind that brings Smalltalk to its knees) which is
simply amazing when you consider that C is fast by ignoring numerical errors,
addressing errors and so on that are carefully handled in Self. The problem was
that the compiler was slow, which was bad since it was running in parallel with
your applications. So Urs Hölzle used two compilers in order to replace type
analysis (slow but precise) with type feedback (much faster and a little less
precise) to make the system both more responsive and simpler. It was also
better in highly object-oriented code, though not as good with numerical code.
Now both Craig and Urs were interested in their new ideas, and so didn't put as
much effort into the register allocation and code generation as commercial
compilers (or even gcc). This makes the 50% of C above even more impressive,
but has made Self have a performance penalty on more advanced processors.
This is how I remember it, but I welcome corrections.

In the other thread, John Ars wrote:
> Hmm, intriguing.  How would you go about that?? Could you write the VM
> in Self (a la Squeak)?

and José Baltasar García Perez-Schofield wrote:
>          I will be pleased to study your next mail, in order to know what
> means to implement self on self. I am very intrigued about it.

The Squeak model is very interesting. There is a normal Squeak program that
implements the whole Squeak Virtual Machine. So you can run Squeak inside of
Squeak (it works perfectly, though 600 times slower than normal). You can make
changes and generate the next version of Squeak this way. When it is working
the way you want it to, you can translate this Squeak program to C (it must be
written in a very restricted, non object oriented style for you to be able to
do this) and then compile this C program to get a new executable VM.

The problem with this is that it doesn't help your own Squeak programs (you
have to write parts of your application in the restricted style and then
generate C as new primitives to be integrated into the virtual machine, hardly
easy stuff to do!). Also, the write Smalltalk -> simulate -> translate to
C -> compile -> restart virtual machine development cycle is not very direct.

But what if we did what the Squeak people did but replaced the bytecode
interpreter in the virtual machine with a compiler (or two)? The virtual
machine could now be written in the normal Self style (no restrictions) and it
would include a part that would take bytecodes and generate (for example) X86
machine language code. If the current machine can't execute this X86 code
(which will be the case, since I am using a Sun Utra 5 as the development
plartform), the code can be sent over a network to a machine that can. Since I
explained this in the VM99 workshop position paper, I won't repeat how this is
done here. See http://www.squeak.org/oopsla99_vmworkshop/jecel.txt

Note that at some point, this Self program will be capable of generating X86
code for any Self program, including itself! It will be used by the whole
system, including all user applications (no more translating to primitives to
make things fast).

Ok, so back to plan B and how to make a good code generator for Self. My
current idea is to combine the three options that I described at the very
begining:

 - (option 2) the compiler will be written in a very abstract way, with more
specific compilers inheriting from it

 - (option 1) the more specific compilers will override data structures, not
methods, so a procesesor description will be declarational instead of functional

 - (option 3) the compiler will work with very simple instructions (just moves
between functional units) until the very end, and then translate these to the
actual machine language

Note that this will favor a move-based CPU (such as I designed for the Merlin
6) even though the first port will be for the X86. Since I am doing the work, I
guess I am entitled to show my biases :-)

The data structures that are used by the compilers are a description of each
CPU in terms of functional units and scheduling. While I have been speaking of
a X86 code generator, the descriptions for the 386, 486, Pentium, Pentium II and
so on could be very different (though the other data structure, the translation
table, would be mostly the same for all of them).

What is the problem with this? The way Self compilers are now, this particular
Self program won't be very efficient. But this program takes away processing
time from all the applications, so the system will not be very responsive if
the compiler is slow (the problem we had in Self 2.0).

So here is a simple idea to fix this: optimize immutable data structures. I'll
give an example in Smalltalk, since it has a syntax for declaring constant
vectors while Self does not. Look at this code:

     #('first text is' 'and the second one is' 'finally, this one is')
                      do: [ :s | s print. ' ' print. s size printString print]

Now imagine that the compiler was smart enough to notice that the receiver of
the #do: message is a constant collection and could pre-evaulate some things to
generate the same code as:

    'first text is' print. ' ' print. 13 print.
    'and the second one is' print. ' ' print. 21 print.
    'finally, this one is' print. ' ' print. 20 print

Yes - this is what is called "partial evaluation". No - this particular case is
not hard to implement or require a Lisp derived language.

If the Self compiler in Self can do this (and we have a richer selection of
constant collections than just immutableString!) and it is implemented as
described above, then it would generate efficient code generators when it
compiled itself.

So this is what I am working on. Though it will take longer than a simple port
of the Sun VM I think the result will be worth it (specially when porting to a
second processor). I don't know if anyone made it this far, but if so I would
love to have some comments or questions on this.

-- Jecel