In a message with this subject on November 22, I mentioned some alternatives for the code generation part of the Self compiler:
- write a program that would read a description of the CPU and generate a code generator for it Self automatically. This would be a kind of "mango for the back end".
- write the back end in a very abstract style, suitable for any CPU. Then we would create a child object for a specific CPU and implement the abstract methods ('generateJump:') manually. The inlining compiler would optimize most of the implied indirections, so the final code would be similar to that of the first option (I am talking about how fast the code generator runs, not the quality of the code it generates)
- write the back end to generate a very primitive and abstract machine code. A patttern matcher in the final pass would translate that into the actual machine language for each CPU
I commented that the first two options (gcc-style) would generate the fastest code generator, but the third option would allow the best scheduling (which is critical for high performance in modern processors). I would recomend that anyone interested in scheduling look at the various papers by Steve Novack (http://www.ics.uci.edu/~snovack/) and also this paper by Urs Hölzle and his students: http://www.cs.ucsb.edu/oocsb/papers/dispatch.html
Just a little background: when David Ungar and Randy Smith first defined the Self language, most people thought that it would be impossible to make it perform as well as Smalltalk since they hadn't compromised the language model to make it more efficient. Well, Craig Chambers and friends proved them wrong by inventing sophisticated type inference systems. They got 50% of C speed on highly numerical code (the kind that brings Smalltalk to its knees) which is simply amazing when you consider that C is fast by ignoring numerical errors, addressing errors and so on that are carefully handled in Self. The problem was that the compiler was slow, which was bad since it was running in parallel with your applications. So Urs Hölzle used two compilers in order to replace type analysis (slow but precise) with type feedback (much faster and a little less precise) to make the system both more responsive and simpler. It was also better in highly object-oriented code, though not as good with numerical code. Now both Craig and Urs were interested in their new ideas, and so didn't put as much effort into the register allocation and code generation as commercial compilers (or even gcc). This makes the 50% of C above even more impressive, but has made Self have a performance penalty on more advanced processors. This is how I remember it, but I welcome corrections.
In the other thread, John Ars wrote:
Hmm, intriguing. How would you go about that?? Could you write the VM in Self (a la Squeak)?
and José Baltasar García Perez-Schofield wrote:
I will be pleased to study your next mail, in order to know what
means to implement self on self. I am very intrigued about it.
The Squeak model is very interesting. There is a normal Squeak program that implements the whole Squeak Virtual Machine. So you can run Squeak inside of Squeak (it works perfectly, though 600 times slower than normal). You can make changes and generate the next version of Squeak this way. When it is working the way you want it to, you can translate this Squeak program to C (it must be written in a very restricted, non object oriented style for you to be able to do this) and then compile this C program to get a new executable VM.
The problem with this is that it doesn't help your own Squeak programs (you have to write parts of your application in the restricted style and then generate C as new primitives to be integrated into the virtual machine, hardly easy stuff to do!). Also, the write Smalltalk -> simulate -> translate to C -> compile -> restart virtual machine development cycle is not very direct.
But what if we did what the Squeak people did but replaced the bytecode interpreter in the virtual machine with a compiler (or two)? The virtual machine could now be written in the normal Self style (no restrictions) and it would include a part that would take bytecodes and generate (for example) X86 machine language code. If the current machine can't execute this X86 code (which will be the case, since I am using a Sun Utra 5 as the development plartform), the code can be sent over a network to a machine that can. Since I explained this in the VM99 workshop position paper, I won't repeat how this is done here. See http://www.squeak.org/oopsla99_vmworkshop/jecel.txt
Note that at some point, this Self program will be capable of generating X86 code for any Self program, including itself! It will be used by the whole system, including all user applications (no more translating to primitives to make things fast).
Ok, so back to plan B and how to make a good code generator for Self. My current idea is to combine the three options that I described at the very begining:
- (option 2) the compiler will be written in a very abstract way, with more specific compilers inheriting from it
- (option 1) the more specific compilers will override data structures, not methods, so a procesesor description will be declarational instead of functional
- (option 3) the compiler will work with very simple instructions (just moves between functional units) until the very end, and then translate these to the actual machine language
Note that this will favor a move-based CPU (such as I designed for the Merlin 6) even though the first port will be for the X86. Since I am doing the work, I guess I am entitled to show my biases :-)
The data structures that are used by the compilers are a description of each CPU in terms of functional units and scheduling. While I have been speaking of a X86 code generator, the descriptions for the 386, 486, Pentium, Pentium II and so on could be very different (though the other data structure, the translation table, would be mostly the same for all of them).
What is the problem with this? The way Self compilers are now, this particular Self program won't be very efficient. But this program takes away processing time from all the applications, so the system will not be very responsive if the compiler is slow (the problem we had in Self 2.0).
So here is a simple idea to fix this: optimize immutable data structures. I'll give an example in Smalltalk, since it has a syntax for declaring constant vectors while Self does not. Look at this code:
#('first text is' 'and the second one is' 'finally, this one is') do: [ :s | s print. ' ' print. s size printString print]
Now imagine that the compiler was smart enough to notice that the receiver of the #do: message is a constant collection and could pre-evaulate some things to generate the same code as:
'first text is' print. ' ' print. 13 print. 'and the second one is' print. ' ' print. 21 print. 'finally, this one is' print. ' ' print. 20 print
Yes - this is what is called "partial evaluation". No - this particular case is not hard to implement or require a Lisp derived language.
If the Self compiler in Self can do this (and we have a richer selection of constant collections than just immutableString!) and it is implemented as described above, then it would generate efficient code generators when it compiled itself.
So this is what I am working on. Though it will take longer than a simple port of the Sun VM I think the result will be worth it (specially when porting to a second processor). I don't know if anyone made it this far, but if so I would love to have some comments or questions on this.
jecel assumpcao jr. email@example.com wrote: original article:http://www.egroups.com/group/self-interest/?start=492
In a message with this subject on November 22, I mentioned some
for the code generation part of the Self compiler:
After reading your post, I realize that I REALLY need to hit my textbooks again!
John Ars wrote:
After reading your post, I realize that I REALLY need to hit my textbooks again!
Do you know any textbooks that have information on this kind of thing? The only book on compilers I have ever looked at (didn't actually read it) was the famous "dragon book" (I forgot the author and title) in the early 80s.
I would be surprised to find something like adaptive compilation on any book other than Urs' and Craig's thesis (both are available from the http://self.sulabs.com web site).
Of course, the real problem might just be that I was too obscure in my post. In that case, I would be glad to try to make things clearer.
jecel assumpcao jr. firstname.lastname@example.org wrote:
Do you know any textbooks that have information on this kind of
thing? The only
book on compilers I have ever looked at (didn't actually read it) was
famous "dragon book" (I forgot the author and title) in the early 80s.
Check varsitybooks.com. I found a couple there:
http://www.varsitybooks.com/categorybrowser.asp?sessid=065E8EB672046F26 AD0D8B43C84035CE&discID=130&discName=Computer+Science&rollupName=Comput er+Science+%26+Engineering&t2=Compiler+Design&t3=&t4=&t5=