Where to use SELF - your opinion

Doug Lea dl at g.oswego.edu
Sat Oct 31 16:51:57 UTC 1992

A chance to contribute something possibly useful to this list! The
ratio of C++ to SELF code I've written is about 1000:1...

You might be amused by this (lightly edited) note I once jotted down
about why C++ code is typically faster than smalltalk etc.
Not exhaustive or authoritative. The general theme is about how
C++ is built so as to (nearly) require certain optimizations,
counter-optimizations, and safety sacrifices by programmers.



* Embedding

    In C++, objects are laid out as records containing subobjects
    (including embedded superclass objects), not pointers to other
    objects, unless explicitly indicated otherwise.


        * Fewer run-time allocations (Space for the object
            is allocated all at once. Space for other internal
            objects that cannot be embedded (but are explicitly 
            pointed to) must be manually allocated within constructor code.) 
        * Less run-time pointer-chasing. Embedded objects are
            accessed via offsets from the base address of the object.
            Can be (and is) done in a way so that the offsets in
            subclass objects are the same as in their superclasses.
        * Since class membership of embedded objects cannot ever
            change, more method -> code mappings may be resolved at
            compile time. Not that C++ compilers actually do much of
            this except in...
        * Extreme cases: Builtin classes like `int' have their 
            representations and implementations irrevocably tied to their 
            interfaces, so are always embedded.


        * Programmers often forget to allocate non-embedded objects,
            or incorrectly allocate them.
        * Embedded objects cannot be switched at run-time,
            eliminating many delegation-style design idioms.
            This also leads to `chopped copy' errors in those
            cases where programmers forget that assignment 
            operations generate partial copies of subclass objects
            rather than the pointer rebindings they probably had in mind.
        * The exact (maximal) class membership of all embedded objects must be
            specified at class-declaration-time. This eliminates
            the ability for subclasses to contain embedded
            objects that are subclasses of those in the original
            declarations, hence detracting from extensibility.
        * It takes enough extra work to create new classes with the
            same interfaces as builtins but with different
            implementations that programmers hardly ever do it.
            (Actually, it's worse: you cannot quite exactly duplicate
            interfaces of builtins.)  This leads to less safe code,
            for example, when multi-precision arith is only sometimes
        * Embedding + the ability to obtain and export the address
           of embedded objects + pointer arithmetic + casts +
            destructors makes GC impossible(?)
        * Allocation and access strategies for embedded superclass
            objects under multiple inheritance are arcane, error
            prone, and the source of many C++ compiler bugs.


        * Arrays in C/C++ are defined as consecutively allocated
            collections of embedded objects. This enables
            fast pointer arithmetic for sequential and random access
            but does not mesh AT ALL with the notion of subclassing.
            This leads to mandated(!) code generation errors;
            e.g., arrays of subclass objects of different size as 
            nominally specified classes are incorrectly addressed
            in many situations.
* Stack allocation

    `Local', `direct' objects are stack-allocated at run time.
    This is also a kind of embedding, but in a different sense.


        * Stack frame allocation is usually faster than heap allocation. 
        * Stack frame deallocation is always faster than first-generation 
            heap scavenging.


        * Programmers must decide when to stack-allocate. They
            often decide wrong, creating dangling pointers.
        * Only objects whose exact class is known at program-time
            may be stack allocated. Programmers sometimes hopelessly mangle
            their designs in order to allow this.
        * Distinctions between stack-allocated and heap-allocated
            objects permeate the language (roughly, `direct' versus
            pointed-to objects), adding to complexity and detracting
            from usability and understandability.

* Manual deallocation

    Heap-allocated objects must be manually deallocated.


        * Deleting objects as soon as they are not needed is
            sometimes faster and less space-consumptive than GC.


        * The fact that an object is unreferenceable is often not
            locally decidable by any object or at any program-point.
            Lots of errors.
        * Some C++ constructs automatically generate objects
            (mainly expression temporaries) that compilers themselves 
            cannot know when to deallocate. Without GC, the
            language is forced to prescribe patently unsafe deallocation
            rules. Lots more errors.
        * The C++ construct (destructors) that triggers most
            deallocation may be abused so as to perform actions
            unrelated to resource management.  More errors, usually
            springing from rules about how destructors are supposed
            to cascade up through superclasses.
        * Exception handling, concurrency, GC, and distribution are all 
            horribly complicated by the presence of destructors.
        * Often enough, manual deallocation is a COUNTER-optimization;
            GC would be faster and sometimes even less space-consumptive.

* Vtbls/Vptrs

    C++ enables (mandates?) a particular dispatching scheme:

        -- Each object contains a pointer (vptr) to a vector (vtbl) of
            function pointers to the implementation code shared by all
            objects of its class. The vector is laid out so that the
            same function ptr lies at the same offset in the vtbls for
            all subclasses of a class.

        -- Each message type is mapped to a vtbl offset at compile
            time. At run time, the call is made via an offset indirect 


        * This is faster than some other schemes.
        * The requirement that offsets be known at compile time
            requires the degree of type-checking enforced by C++.
            Generally, programmers may only invoke a method on an
            object if it is trivially decidable that the recipient
            actually implements it. 
        * There is no need for a run-time system that represents
            class and method characteristics in order to support
            run-time method->code mappings and queries. (This
            is less true if/when C++ compilers support exceptions 
            and run-time type queries.)
        * This scheme is among the few that are compatible with
            separate compilation, which is mandated(?) in C++.


        * This is not always faster than other schemes, especially
            under multiple inheritance. With MI, a second level of
            indirection must sometimes be employed, and always be
            tested for.
        * Vtbl-oriented C++ rules make it difficult to build compilers
            that employ faster strategies in those cases where they
            are possible. For example, there is no way for programmers
            to declare that a class is necessarily a `leaf' class,
            whose methods are all statically determinable (and thus
            may be directly called, inlined, or procedurally
        * Because C++ mandates that method->offset maps be statically
            computable, the language introduces casts by which
            programmers assert that an object has a narrower type than
            trivially determinable from declared type information.
            Programmers sometimes get this wrong, leading to less
            safe code than possible in languages like smalltalk and
            SELF where the run-time system always correctly performs
            similar mappings.
        * Similarly, there is sometimes no simple way to express the
            notion `if this object supports method m, then call it'.
        * Vtbl-oriented rules do not generalize to multimethods.
            All multimethods are only statically resolved based on
            nominal type information. This leads to errors in the
            common case where the static best match is not the same as
            the dynamic best match. More often, it leads to design errors in
            which programmers attempt to employ single-dispatching
            when they ought to be using multimethods.
        * Similarly, top-level operations are only statically resolved
            in accord with very messy rules (the infamous ARM section
            13.2). More errors. It also leads to programmers defining
            irrelevant one-argument operations as methods in classes
            simply in order to have them dispatch right.
        * Since vtbls are shared by all objects of a class,
            individual objects cannot dynamically change
            method implementations in a simple way. All strategies
            for evading this are error-prone and/or unsafe.
        * The C++ type system appears to be more centered around
            static mapping considerations than true type safety.  For
            example, covariant return types, contravariant arguments,
            closest-match multimethod, top-level, and class-wide
            (`static member function') operation dispatch, and other
            conformance-based policies would be more safe than current
            rules allow.

* Templates

    With parameterized types, related classes (and functions) may
    share source code that is expanded in context rather than
    shared and invoked.


        * Expansion often allows easier and better contextual
        * Expansion enables object embeddings (as above) that would not
            possible otherwise, so saves indirection.
        * The associated type mechanisms improve both expressiveness
            and static safety. 
            Expressiveness: Since C++ has no root class, it is
                otherwise impossible to declare, e.g., a `ListOfAnything'.
                (Although cheating via `void*' is semi-officially sanctioned.)
            Safety: Or rather, more convenient safety. For example, if
                a client needs a ListOfBorderedWindow rather than a
                ListOfWindow in order to be sure it can recolor the
                borders of all elements, it is easy to declare both
                kinds of classes as instantiations of ListOF<T>. These
                share source code even though they do not bear a
                subclass relation to each other.  If programmers had
                to redeclare each kind from scratch, they wouldn't.

        * Expansion is normally, but not always always better. It
            generates bigger executables, which may run more slowly
            because of cache misses, etc.  There is no easy way for
            programmers to specify that two classes bear a
            parameterized class relation to each other but should
            share exectutable code. It would be extremely hard for
            compilers to discover this on their own. This is exactly
            the opposite problem as mentioned above, where you'd like
            compilers to special-case expand/customize leaf class
            mechanics, etc.
        * As above, C++ PT rules appear more centered around
            guarantees about expansion than safety. 

* Morals

    Embedding, stack allocation, manual deallocation, vtbl dispatching
    and template-based expansion are all (sometimes) OPTIMIZATIONS that C++
    pretty much forces upon programmers. 

    In better languages, these things would at best be `pragmas', and
    mostly performed by a compiler in those cases where the overall
    design allowed it, and in which they are expected to in fact be
    optimizations rather than pessimizations.  This would be pretty
    hard. But as shown by some of the work on the SELF compiler, many
    of these, plus others, are not at all outside the realm of
    technical feasibility.

    Given the lack of a better language, C++ programmers may still
    design programs in ways that completely ignore these (mis)features
    (e.g., use no embedded objects, no direct locals, use generators
    instead of constructors, assume `smart pointers' enabling GC, use
    conformance-based interface design, etc.)  and then perform those
    necessary or desired optimizations and other (non-optimizing)
    transformations from there. Good tools and `design languages'
    would help a lot.

 4-Nov-92 19:17:51-GMT,1374;000000000000
Received: by otis.Stanford.Edu (4.1/SMI-4.1)
	id AA03534; Wed, 4 Nov 92 11:17:51 PST
Resent-Message-Id: <9211041917.AA03534 at otis.Stanford.Edu>
Return-Path: <urs>
Received: by otis.Stanford.Edu (4.1/SMI-4.1) id AA03523; Wed, 4 Nov 92
        11:16:24 PST
Sender: Urs Hoelzle <urs at otis>
Date: Wed, 4 Nov 92 11:16:24 PST
From: urs at cs.stanford.edu (Urs Hoelzle)
Reply-To: urs at cs.stanford.edu
To: self-interest
Subject: Self video tape
Message-Id: <CMM. at otis>
Resent-To: real-self-interest
Resent-Date: Wed, 4 Nov 92 11:17:50 PST
Resent-From: Urs Hoelzle <urs at otis>

I have been asked by several people whether we have a Self tutorial on
video tape.  Unfortunately, we do not have a recording of the full (3
hour) Self tutorial given at ECOOP, but we do have a tape of a
lecture on Self given in a programming languages course at Stanford.
The lecture is 50 minutes long and gives an introduction into the
basic ideas of Self.  It does not cover some advanced topics like
multiple inheritance or privacy.  If there's enough interest I could
also provide PostScript for a set of student handouts (copies of the
slides used in the lecture).

If you're interested in getting the tape (for around $25), please drop
me a line.  The tape is copyrighted by Stanford, i.e.  duplication or
for-profit use requires written permission.  (Educational use is ok.)


 5-Nov-92 16:02:55-GMT,10545;000000000000
Received: by otis.Stanford.Edu (4.1/SMI-4.1)
	id AA04333; Thu, 5 Nov 92 08:02:55 PST
Resent-Message-Id: <9211051602.AA04333 at otis.Stanford.Edu>
Return-Path: <jackson at parcplace.com>
Received: from myself.stanford.edu by otis (4.1/SMI-4.1) id AA03738; Wed, 4
        Nov 92 16:39:55 PST
Received: from parcplace.parcplace.com ([]) by
        myself.stanford.edu (4.1/SMI-4.1) id AA10167; Wed, 4 Nov 92 16:39:51
Received: from central (central.parcplace.com) by parcplace.parcplace.com
        (4.1/SMI-4.1) id AA11087; Wed, 4 Nov 92 16:41:54 PST
Received: by central (4.1/SMI-4.1) id AA14822; Wed, 4 Nov 92 16:41:49 PST
From: jackson at parcplace.com (Frank Jackson)
Message-Id: <9211050041.AA14822 at central>
Date: 4 November 1992 4:41:53 pm
Subject: Re: Weak Arrays
In-Reply-To: eliot at dcs.qmw.ac.uk's letter of: 23 October 1992
To: self-interest at myself.stanford.edu
Cc: eliot at dcs.qmw.ac.uk
Fonts: 9514 1
Resent-To: real-self-interest
Resent-Date: Thu, 5 Nov 92 8:02:54 PST
Resent-From: Urs Hoelzle <urs at otis>

Below are some (lengthy) comments on Eliot's message regarding weak references
and finalization. Delete now if you're not interested.

>I've just implemented weak referees (WeakArray, WeakOrderedCollection etc)
>and finalization (objects with a finalize method get sent finalize when they're
>about to die) in my deferred reference counting collector.  It took three days
>to do (because I made some stupid mistakes).  If you have a working garbage
>collector that you understand and some spare bits in objects (you can usually
>shrink the size field) it should be very easy to implement.

Congratulations on being able to implement finalization in a mere matter of
days! However, I think that your task may have been simplified somewhat by the
fact that you were working with an embalming collector (i.e., one that directly
manipulates--embalms, so to speak--an object in order to reclaim its memory),
and deferred reference-counting mechanisms are, to a first approximation,
embalming collectors.

For example, reference-counting collectors, traditional mark-sweep collectors,
and other collectors that explicitly recycle objects can implement finalization
by simply marking those objects that are subject to finalization in some manner
(e.g., by turning on a bit in the object's header) and then checking to see if
a "dead" object is subject to finalization prior to actually recycling the
object (the reference count bits can also be used to indicate that an object is
subject to finalization--see the discussion on weak references below). If it
turns out that a "dead" object is subject to finalization, thi
>>>>[...message lost from here...]
simply arranges for finalization to occur (e.g., it arranges for a finalization
message to be sent to this object) rather than actually recycling the object on
the spot. Sounds simple enough.

On the other hand, copying collectors (e.g., generation scavengers) have to do
a little more work. This is because they do not explicitly recycle dead
objects. Such collectors are efficient (when managing object populations that
exhibit certain demographic characteristics) precisely because they ignore dead
objects. That is, rather than explicitly recycling dead objects, they simply
copy all live objects to a different location. The memory occupied by the dead
objects is then recycled by virtue of the fact that it is reused for subsequent
object allocations.

However, since copying collectors don't ordinarily touch dead objects, some
additional mechanism must be added to these collectors if they are to identify
those "dead" objects that require finalization. I will argue below that this
additional mechanism is the moral equivalent of maintaining a weak reference to
each object that is subject to finalization.

In any case, it took me personally several months to fully implemement and test
the weak-reference/finalization scheme that is currently utilized in all
ParcPlace Smalltalk products (credit for the design of this subsystem should go
to Dave Ungar, Allan Schiffman, and Peter Deutsch as well). In addition, it
took several additional months of heavy use--spread out over roughly a year and
a half of elapsed time--to shake out some subtle bugs. Of course, the task of
implementing and testing the ParcPlace system was complicated by the fact that
the ParcPlace memory manager has three to four different garbage collectors
(depending upon how you count) -- namely, an Ungar-style generation scavenger,
a fully incremental mark-sweep garbage collector, and a fully compacting,
non-incremental mark-sweep garbage collector (the scope of the latter can be
expanded to include all of object memory or shrunk to exclude any quasi
permanent objects). Modifying each of these collectors to support weak
references and finalization took additional time and effort, although, as I
recall, the bulk of the time was spent modifying the scavenger, since, as noted
above, it is a copying collector.

Actually, the vast majority of the low-level effort went into supporting weak
references rather than finalization per se. This is because (as discussed
below) we implemented our finalization mechanism on top of our weak-reference
mechanism (i.e., once we had weak references, we were only a small step away
from a fully general finalization mechanism--more on this particular aspect
below). In any case, I suspect that it would take a similar amount of effort to
add support for finalization to the current Self implementation since it also
utilizes a generation scavenger as its primary reclamation subsystem (although
the task of modifying the Self implementation may be somewhat easier since it
doesn't have quite as many ancillary garbage collection subsytems to support as yet).

>I don't like the fact that the ParcPlace scheme
>	1. indivisibly links WeakArrays & finalization
>	2. only gives you WeakArray
>My scheme keeps weak references and finalization orthogonal, and allows
>any object with indexed fields to have weak references if it uses a 'weakAtPut'
>primitive in place of the standard weakAtPut.

Interestingly enough, early on during the process of designing of our
finalization mechanism, Dave and Allan observed that weak references plus
notification essentially equals finalization. That is, all one needs to
implement finalization was the capability to support weak references, coupled
with the ability to notify interested clients that a weak reference had been broken.

This conclusion was reached independently at Sun when they were building their
NeWS implementation. To finalize an object in NeWS, you establish a weak
reference to the object and then express interest in receiving notification
when this object is only referenced weakly. NeWS keeps two sets of reference
counts for each object--one for strong references and one for weak references.
When an object's strong reference count drops to zero, but that object is still
referenced weakly, then NeWS will use it's event-dispatching mechanism to
dispatch an obsolete event to clients who have previously expressed interest in
such events. Upon receiving the obsolete event, clients can then perform the
appropriate finalization actions, including nilling out the weak reference to
the object (which will ultimately cause the object to be reclaimed by the
garbage collector if there are no other references to it).

Similarly, the ParcPlace implementation uses its weak-reference subsystem to
detect the need for finalization and its dependency mechanism to trigger the
appropriate finalization actions. Aside from the fact that the NeWS system
supports weak references via a reference-counting collector (whereas ParcPlace
does so via its scavenger and other assorted collectors), the prime difference
between the two systems is that ParcPlace supports postmortem finalization
whereas NeWS (and every other finalization mechanism that I know of) supports a
last-rites style of finalization. In a postmortem regime, "obsolete" objects
are actually reclaimed prior to performing any user-specified finalization,
whereas in a last-rites regime, the finalization actions are performed prior to
reclaiming the object in question. Both approaches have interesting advantages
and disadvantages that I won't go into here.

In any case, we thought it was both natural and appropriate to layer our
finalization mechanism atop our weak-reference subsytem. In fact, we considered
finalization to be something of an extension of our weak-reference subsystem.
That is, instead of simply nilling out a weak reference to a dead object
(itself a kind of finalization action) we allow the user to perform arbitrary
actions when such weak references are broken, and this mechanism is
sufficiently general to support object finalization as well. This is not to say
that weak references cannot be used without concerning oneself with
finalization--of course, one can. Nor do we believe that we've compromised or
restricted the efficacy of our finalization mechanism by virtue of the fact
that we've chosen to layer our finalization subsystem atop a weak-reference
subsystem. The two subsystems can be used in an orthogonal manner if one so
chooses. The fact that finalization is layed atop weak references can be
considered an implementaton detail. 

Nevertheless, we think the two notions are fundamentally linked and were quite
pleased that we were able to demonstrate this relationship by implementing one
atop the other. In any event, given the fact that we were implementing
finalization in the face of a copying collector, we knew that this collector
would have to keep what would amount to a weak reference to each object that
was subject to finalization. Rather than implement some limited weak reference
mechanism designed solely for use by our scavenger, we thought we would simply
reuse the more general mechanism that we were planning to offer our users.

As for extending the weakness property to other classes, this is an obvious
next step, one that we haven't gotten around to supporting yet primarily
because no one has requested it (a state of affairs that I anticipate will
change in the near future)!

At any rate, finalization and weak references are fairly nascent technology, so
it seems both appropriate and advantageous for different systems to support and
utilize these capabilites in different manners (i.e.,  "let a thousand flowers
bloom"), at least until such time that it is clear which type of system is most
appropriate under which circumstances.

Frank Jackson
jackson at parcplace.com

More information about the Self-interest mailing list