Hello, folks! Let me bother once more with those implementation issues... I am implementing a simple bytecode interpreter for the Self 4.0 VM, which I suppose will make it portable among platforms other than the SPARC. I have been studying the Self bytecodes, mainly in the fast_compiler VM code and a paper dated 1989 (from Self 1.0 or 2.0). There are eight different codes: SELF_CODE: push the object whose code is being interpreted on the stack LITERAL_CODE: push a literal (the index of it on the literals vector of the code is given) on the stack RETURN_CODE: return INDEX_CODE: load the index extension register with the 6th to 10th bits of the extended index. This will be used in the next code to indicate that the index of the next code is greater than 31 SEND_CODE: send a message to an object. The receiver, the selector and arguments are given somewhere. IMPLICIT_SEND_CODE: send a message, just like the previous, only that the receiver in this case is self. RESEND_CODE: send a message to a parent (directed resend) or to all parents (undirected resend). This is equivalent to Java's super.---() method invocations or to C++'s overridden non-virtual method invocations. Smalltalk has something similar, I have read. DELEGATEE_CODE: load the delegatee register with the parent to whom the next resend should be directed.
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make? It seems that in the fast_compiler, when the machine code for a send is generated, the information of number of arguments is kept somewhere. Where?
If someone can explain that to me, I appreciate.
Cheers to all, Douglas
(P.S.: This is the second version of an e-mail I was writing to the list. The first was lost when Netscape exited with a GPF. This is why you might notice that I have resumed the explanations about the bytecodes)
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make?
You can derive that from the selector symbol. Unary selectors (that are selectors composed from letters - especially the first character must be a lowercase letter - without ':' in it) need no arguments at all. Binary selectors (that are all selectors which neither start with a letter or with an '_') have exactly one argument. For keyword selectors (which are composed of sequences of letters (and digits) that end with a colon ':') simply count the number of colons.
A different problem is to know when to pop returned objects from the stack which aren't used. I think, you cannot detect that and simply adjust the stack when you leave the method. Here's an example: 3+4. nil
This will generate something along: push 3, push 4, send #+, push nil.
The + method for integers will pop both 3 and 4 from the stack and push the result, 7. However, this object isn't needed and will use one stack slot upton the method returns (with nil).
There might be a way to notice that "7" isn't used anywhere in the method, but that's probably to much work for an interpreter. A compiler that will create and analyse a complete parse tree for each method can do this.
bye -- Stefan Matthias Aust // Truth Until Paradox!
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
jecel@lsi.usp.br wrote:
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make? It seems that in the fast_compiler, when the machine code for a send is generated, the information of number of arguments is kept somewhere. Where?
You can just count the number of ':' characters in the selector name, and that is the number of arguments. Except that if the selector name is composed of special characters, then we have a binary selector and there is one argument.
All right, but the parser might already have done that, in fact the fast_compiler code generation reads this arg_count from somewhere.
Testing this at every message send is *very* inefficient - the Smalltalk bytecodes encode the number of arguments in the send bytecode itself. But since the Self bytecodes were meant to be compiled away, this wasn't considered a problem.
Jecel, What I want to do is to switch the compiler, assembler and runtime (part of, that is) off, plug in an interpreter in every point where this whole subsystem is used and be able to compile it under Solaris x86 or Linux by defining in the makefile something like -DPORTABLE, for example. I mean, my interpreter is to interface directly with the method objects generated by the parser, not to replace the parser. This would be a temporary solution to the problem of portability, introducing the inefficiency of the interpreter. Perhaps this is because I miss the old time when I ran OS/2 2.1 with 4MB RAM :-)
For an interpreter, there isn't a good solution if you are going to use a standard Self world. If you don't mind creating your own, slightly different, world you could separate canonical strings representing selectors into different "types". One way to do this would be to add a constant slot indicating the number of arguments when you canonicalize a string. That way, 'last' would have a constant slot with the value 0, while for 'between:And:' that slot's value would be 2. This slot would always be in the same place in the string's map (if you don't make any other changes) so your interpreter can easily access it.
sma@netsurf.de wrote:
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make?
You can derive that from the selector symbol. Unary selectors (that are selectors composed from letters - especially the first character must be a lowercase letter - without ':' in it) need no arguments at all. Binary selectors (that are all selectors which neither start with a letter or with an '_') have exactly one argument. For keyword selectors (which are composed of sequences of letters (and digits) that end with a colon ':') simply count the number of colons.
A different problem is to know when to pop returned objects from the stack which aren't used. I think, you cannot detect that and simply adjust the stack when you leave the method. Here's an example: 3+4. nil
This will generate something along: push 3, push 4, send #+, push nil.
The + method for integers will pop both 3 and 4 from the stack and push the result, 7. However, this object isn't needed and will use one stack slot upton the method returns (with nil).
There might be a way to notice that "7" isn't used anywhere in the method, but that's probably to much work for an interpreter. A compiler that will create and analyse a complete parse tree for each method can do this.
Well, Stefan, I am using the following principle to get to do something about Self: The Self Group has already done a lot to prove Self to be efficient. Now someone should make it portable. This won't be straightforward if one wants to incorporate all the benefits of the adaptive compilation, so I want to make Self run slowly on an x86-platform in a way such that the VM code can be compiled for SPARC, MIPS, Acorn, etc. Of course, the FIRST step is to neutralize the processor dependency, so after that we'll have code that is still dependent on the operating system. This is why I have chosen Solaris x86 as my development platform (it could have been Linux, but I wanted to make sure there would be the least difference possible from Solaris SPARC). It is the same as the original environment in which Self ran, but the processor is different. Next step will be to "make glue" for other systems. You can see the example of Java, they have first provided a UI class library that was specially written for each platform (the heavyweight components), then they evolved by providing Swing, which is platform-independent and the platform-dependent part of the code is reduced to windows (don't take this too precisely, it isn't). So can we do, by rewriting code that is implemented as primitives in Self. In fact, before that I would like to study more carefully the primitives available to define a minimal primitive set, perhaps a "microkernel VM" :-) Jecel's ideas about the Squeak Smalltalk system, in which much more is implemented in the language itself than in Self, make much sense to the evolution of Self in my opinion.
bye
Thanks for your comments and ideas. Regards, Douglas
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
This is a great idea! In fact I have already done some of it. I wrote an interpreter and had it running on my Mac, up to running the scheduler. Are you local? We could get together and talk about this sometime. In fact, the Self bytecodes are quite bad for interpretation. The interpreter had to count colons at every send. You would need a different bytecode set to take this approach too far. (I have chosen instead to retarget the NIC -- still a lot of work.)
- Dave
At 8:58 AM -0300 6/1/99, Douglas Atique wrote:
jecel@lsi.usp.br wrote:
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make? It seems that in the fast_compiler, when the machine code for a send is generated, the information of number of arguments is kept somewhere. Where?
You can just count the number of ':' characters in the selector name, and that is the number of arguments. Except that if the selector name is composed of special characters, then we have a binary selector and there is one argument.
All right, but the parser might already have done that, in fact the fast_compiler code generation reads this arg_count from somewhere.
Testing this at every message send is *very* inefficient - the Smalltalk bytecodes encode the number of arguments in the send bytecode itself. But since the Self bytecodes were meant to be compiled away, this wasn't considered a problem.
Jecel, What I want to do is to switch the compiler, assembler and runtime (part of, that is) off, plug in an interpreter in every point where this whole subsystem is used and be able to compile it under Solaris x86 or Linux by defining in the makefile something like -DPORTABLE, for example. I mean, my interpreter is to interface directly with the method objects generated by the parser, not to replace the parser. This would be a temporary solution to the problem of portability, introducing the inefficiency of the interpreter. Perhaps this is because I miss the old time when I ran OS/2 2.1 with 4MB RAM :-)
For an interpreter, there isn't a good solution if you are going to use a standard Self world. If you don't mind creating your own, slightly different, world you could separate canonical strings representing selectors into different "types". One way to do this would be to add a constant slot indicating the number of arguments when you canonicalize a string. That way, 'last' would have a constant slot with the value 0, while for 'between:And:' that slot's value would be 2. This slot would always be in the same place in the string's map (if you don't make any other changes) so your interpreter can easily access it.
sma@netsurf.de wrote:
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make?
You can derive that from the selector symbol. Unary selectors (that are selectors composed from letters - especially the first character must be a lowercase letter - without ':' in it) need no arguments at all. Binary selectors (that are all selectors which neither start with a letter or with an '_') have exactly one argument. For keyword selectors (which are composed of sequences of letters (and digits) that end with a colon ':') simply count the number of colons.
A different problem is to know when to pop returned objects from the stack which aren't used. I think, you cannot detect that and simply adjust the stack when you leave the method. Here's an example: 3+4. nil
This will generate something along: push 3, push 4, send #+, push nil.
The + method for integers will pop both 3 and 4 from the stack and push the result, 7. However, this object isn't needed and will use one stack slot upton the method returns (with nil).
There might be a way to notice that "7" isn't used anywhere in the method, but that's probably to much work for an interpreter. A compiler that will create and analyse a complete parse tree for each method can do this.
Well, Stefan, I am using the following principle to get to do something about Self: The Self Group has already done a lot to prove Self to be efficient. Now someone should make it portable. This won't be straightforward if one wants to incorporate all the benefits of the adaptive compilation, so I want to make Self run slowly on an x86-platform in a way such that the VM code can be compiled for SPARC, MIPS, Acorn, etc. Of course, the FIRST step is to neutralize the processor dependency, so after that we'll have code that is still dependent on the operating system. This is why I have chosen Solaris x86 as my development platform (it could have been Linux, but I wanted to make sure there would be the least difference possible from Solaris SPARC). It is the same as the original environment in which Self ran, but the processor is different. Next step will be to "make glue" for other systems. You can see the example of Java, they have first provided a UI class library that was specially written for each platform (the heavyweight components), then they evolved by providing Swing, which is platform-independent and the platform-dependent part of the code is reduced to windows (don't take this too precisely, it isn't). So can we do, by rewriting code that is implemented as primitives in Self. In fact, before that I would like to study more carefully the primitives available to define a minimal primitive set, perhaps a "microkernel VM" :-) Jecel's ideas about the Squeak Smalltalk system, in which much more is implemented in the language itself than in Self, make much sense to the evolution of Self in my opinion.
bye
Thanks for your comments and ideas. Regards, Douglas
eGroups Spotlight: "Disabled Veterans" - News, discussion and information sharing on disabled veteran issues and concerns. http://clickhere.egroups.com/click/114
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
David Ungar Sun Microsystems Laboratories (650) 336-2618
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
At 08:58 AM 6/1/99 -0300, Douglas Atique wrote:
The Self Group has already done a lot to prove Self to be efficient. Now someone should make it portable. This won't be straightforward if one
wants to
incorporate all the benefits of the adaptive compilation, so I want to make Self run slowly on an x86-platform in a way such that the VM code can be compiled for SPARC, MIPS, Acorn, etc. Of course, the FIRST step is to neutralize the processor dependency, so after that we'll have code that is still dependent on the operating system.
It's an interesting project and a well apreciated idea. However I'd think it might be easier to start with a new portable VM from scratch. If you then provide the same set of primitives, it should be able to run all the Self code.
Whatever approach you follow, you can internally transform the "offical" set bytecode set into something that can be interpreted more efficient. This is similar to Squeaks Jitter technology. That Just In Time compiler takes the original Squeak instructions and transforms them into a new stream of threaded code instructions that can be interpreted with less overhead and is therefore faster.
Instead of dealing with extension instructions and the unknown number of argumens, the problem whether return values are needed or not and implicit method returns, a new instruction set could make all of this explicit. I'd suggest to use a direct or indirect threaded code approach here.
As a second step could could inline condition and looping instructions and then you should get a similar performance as Squeak which is quite aceptable for an interpreted system.
You can see the example of Java, they have first provided a UI class library that was specially written for each platform (the heavyweight components), then they evolved by providing Swing
I think, that was more a political decision than a technical. Both VisualWorks Smalltalk and Squeak started with an emulated GUI just because that was easier to create. Java also could have started with a non-native GUI from the beginning. IMHO they underestimated the problems of a cross platform native GUI. With the decay of ParcPlace a lot of Smalltalk programmers with experiences of VW's GUI joined Sun and they could do Swing.
In fact, before that I would like to study more carefully the primitives available to define a minimal primitive set, perhaps a "microkernel VM" :-)
That's a good idea.
Jecel's ideas about the Squeak Smalltalk system, in which much more is implemented in the language itself than in Self, make much sense to the evolution of Self in my opinion.
Yes.
bye -- Stefan Matthias Aust // Truth Until Paradox!
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
When I was working on the interpreter, I also added a POP bytecode. If I can release the lastest Self VM, folks out there could have it. Would there be any interest?
- Dave
At 4:38 PM -0400 5/31/99, Stefan Matthias Aust wrote:
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make?
You can derive that from the selector symbol. Unary selectors (that are selectors composed from letters - especially the first character must be a lowercase letter - without ':' in it) need no arguments at all. Binary selectors (that are all selectors which neither start with a letter or with an '_') have exactly one argument. For keyword selectors (which are composed of sequences of letters (and digits) that end with a colon ':') simply count the number of colons.
A different problem is to know when to pop returned objects from the stack which aren't used. I think, you cannot detect that and simply adjust the stack when you leave the method. Here's an example: 3+4. nil
This will generate something along: push 3, push 4, send #+, push nil.
The + method for integers will pop both 3 and 4 from the stack and push the result, 7. However, this object isn't needed and will use one stack slot upton the method returns (with nil).
There might be a way to notice that "7" isn't used anywhere in the method, but that's probably to much work for an interpreter. A compiler that will create and analyse a complete parse tree for each method can do this.
bye
Stefan Matthias Aust // Truth Until Paradox!
eGroups Spotlight: "Military Spouse Unlimited" - Participate in this support group for military spouses of on-duty servicemen and women. http://clickhere.egroups.com/click/118
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
David Ungar Sun Microsystems Laboratories (650) 336-2618
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
David Ungar wrote:
If I can release the lastest Self VM, folks out there could have it. Would there be any interest?
Sure. (See, no quotes below!)
bye -- Stefan Matthias Aust // Truth Until Paradox!
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make? It seems that in the fast_compiler, when the machine code for a send is generated, the information of number of arguments is kept somewhere. Where?
You can just count the number of ':' characters in the selector name, and that is the number of arguments. Except that if the selector name is composed of special characters, then we have a binary selector and there is one argument.
Testing this at every message send is *very* inefficient - the Smalltalk bytecodes encode the number of arguments in the send bytecode itself. But since the Self bytecodes were meant to be compiled away, this wasn't considered a problem.
For an interpreter, there isn't a good solution if you are going to use a standard Self world. If you don't mind creating your own, slightly different, world you could separate canonical strings representing selectors into different "types". One way to do this would be to add a constant slot indicating the number of arguments when you canonicalize a string. That way, 'last' would have a constant slot with the value 0, while for 'between:And:' that slot's value would be 2. This slot would always be in the same place in the string's map (if you don't make any other changes) so your interpreter can easily access it.
-- Jecel
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
Another idea is to compile the self byte codes upon method activation into another set of instructions that is a bit more interpreter-friendly. You'd get rid of all EXTEND instructions, would add the number of arguments for a method send and perhaps even introduce more codes for poping the stack or to translate ifTrue:IfFalse: or whileTrue: methods to jumps (even if that shouldn't be done in Self).
bye -- Stefan Matthias Aust // Truth Until Paradox!
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
Yup!
- Dave
At 10:17 PM -0400 5/31/99, Stefan Matthias Aust wrote:
Another idea is to compile the self byte codes upon method activation into another set of instructions that is a bit more interpreter-friendly. You'd get rid of all EXTEND instructions, would add the number of arguments for a method send and perhaps even introduce more codes for poping the stack or to translate ifTrue:IfFalse: or whileTrue: methods to jumps (even if that shouldn't be done in Self).
bye
Stefan Matthias Aust // Truth Until Paradox!
eGroups Spotlight: "All Things US Navy" - US Naval discussion about navy people, veterans, places, events, history, and current events. http://clickhere.egroups.com/click/184
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
David Ungar Sun Microsystems Laboratories (650) 336-2618
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
Also an good idea! (forgive, me I am reading mail backwards today) It does have the slight drawback that the count is a memory-reference away instead of being in the interpreter's I-stream.
- Dave
It all seems fine, but I don´t understand something seemingly straightforward: knowing that in a send the receiver and arguments are popped off stack and the result is pushed onto stack, how is the number of arguments of a send determined, so that my interpreter could know how many pops it should make? It seems that in the fast_compiler, when the machine code for a send is generated, the information of number of arguments is kept somewhere. Where?
You can just count the number of ':' characters in the selector name, and that is the number of arguments. Except that if the selector name is composed of special characters, then we have a binary selector and there is one argument.
Testing this at every message send is *very* inefficient - the Smalltalk bytecodes encode the number of arguments in the send bytecode itself. But since the Self bytecodes were meant to be compiled away, this wasn't considered a problem.
For an interpreter, there isn't a good solution if you are going to use a standard Self world. If you don't mind creating your own, slightly different, world you could separate canonical strings representing selectors into different "types". One way to do this would be to add a constant slot indicating the number of arguments when you canonicalize a string. That way, 'last' would have a constant slot with the value 0, while for 'between:And:' that slot's value would be 2. This slot would always be in the same place in the string's map (if you don't make any other changes) so your interpreter can easily access it.
-- Jecel
FREE email Newsletters delivered right to your in-box. CNET, USAToday, RollingStone, and more Click Here Now! http://clickhere.egroups.com/click/314
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
David Ungar Sun Microsystems Laboratories (650) 336-2618
------------------------------------------------------------------------
eGroups.com home: http://www.egroups.com/group/self-interest http://www.egroups.com - Simplifying group communications
self-interest@lists.selflanguage.org