[self-interest] self-ish factoring

Thu Jan 11 04:01:25 UTC 2001

On Monday 08 January 2001 12:44, you wrote:
> Kyle,
>
> for the 37380 methods in the Demo snapshot, we have this distribution
> of source length (31886, or 85%, are less than 1000 bytes long):
>
>  0-99             10776
>  100-199         4778
>  200-299         3791
>  300-399         4110
>  400-499         2780
>  500-599         1773
>  600-699         1212
>  700-799         1063
>  800-899          887
>  900-1000        716
>  1000-1999    4173
>  2000-2999      593
>  3000-3999      407
>  4000-4999        99
>  5000-5999        28
>
>     10500               1
>
>   219700           193

Interesting.  Note quite what I was thinking about, but still quite 
interesting.  These are a bit shorter than I was thinking, but these are 
bytes not method calls.

> Repeating the same thing but counting in number of lines we have
> (20085, 53%, are less then 10 lines long):
>
>   1                    5615
>   2                    2101
>   3                    2721
>   4                    2630
>   5                    1728
>   6                    1582
>   7                    1391
>   8                    1323
>   9                     994
>   10-19          10441
>   20-29            3274
>   30-39            1860
>   40-49              622
>   50-59              259
>   60-69              268
>   70-79              128
>   80-89              131
>   90-99                91
>   100-109            25
>   120-129              2
>
>        238                 1
>      3449             193

The histograms of this are interesting.  Nearly 25% are between 10 and 20 
lines.  That actually seems a bit surprizing.  I would have thought that the 
bulge would be in the 5-10 range.

> The 238 line method is one that creates a frameMorph full of little
> icons, all "spelled out" in details. The 193 very large methods seem to
> be code to recreate objects for the tutorial.

They can probably be dropped off the data since they are known exceptions.

> When browsing with an outliner, you can directly see the code for all
> methods which are just one line long when they, plus the method name,
> are short enough to fit in the outliner's width. My impression is that
> from one fourth to one half of the methods I see are in that group,
> which seems to agree with the numbers above.

It is not clear whether lines of code or bytes correspond more closely to 
FORTH-style factoring.  In fact, I wasn't considering methods at all as part 
of Self's factoring, but objects with all their methods.  Perhaps looking at 
methods is a better metric?  It is not clear.  FORTH doesn't really support 
objects directly (though you can extend it to do so easily enough).  It is 
quite inefficient in most OO languages to define objects with just a few 
methods (for instance two).  Generally, objects start "acquiring" more 
methods and grow as time goes on.

A better mapping might be objects in OO and vocabularies/wordlists in FORTH?

> I have written some very large methods in Self, unfortunately. These
> were either due to a lack of experience or the need to do complex
> object initializations. So I would expect future Self programs to be
> nearly as well factored as Forth programs. Note that it was easier to
> deal with longer methods in the old, text based Selfs but with shorter
> methods (you don't have to open them) in Self 4.

Graphics are nice, but there is a loss of semantic density.  The proceedings 
from InterCHI '93 have some stuff on this I think (it was the only one I 
attended).  There are some things that are simply easier to communicate with 
special shorthand symbols.  For instance,

	a[42]->y(x);

(In C).  This is very compact.  In just a few bytes I can represent an array 
operation, a structure field access and a function call with specified 
argument.  How do you show this graphically?

> Forth has an advantage - there is little overhead for defining words
> and none at all for using them. But Self has objects, not just words.
> And this is a very, very important point: a truly object oriented
> program has most of its "smarts" in the way the objects are connected
> to each other, not inside each object. This makes them hard to
> understand by reading the code - the methods are so short and mostly
> seem to be delegating the same messages to other objects instead of
> actually doing anything.

Ideally each object's interactions with the objects it uses for 
implementation is clear from the source.  Sometimes it isn't.  FORTH 
definitely has no advantage here.  It is far too easy to write write-only 
code in FORTH.

Actually, your point about OO being somewhat _more_ difficult to understand 
is interesting and particularly relevent.  I'll have to think on this more.  
I have long thought that the manner in which the program was presented was 
much more important to understanding than grammatical issues.  

> The other day I was trying to explain this to a person and comparing
> objects with neural networks. People coming from a procedural language
> tend to create one giant object with a few, large methods and several
> dumb objects (nothing but data slots).

And here I always thought that first time C++ programmers always encapsulate 
a char in an object :-)  Then they start up the steep slope of copy 
constructors etc.

I have seen both extremes.  In one, first time OO programmers (with a 
procedural background) do as you note.   In the other, they make everything 
in sight an object and then try to use objects as functions in their main 
routine.  It generates... interesting code. I have had better luck explaining 
that OO is a really nice way of writing clean looking ADTs.

FORTH is "clean" because it lets you define what to do very easily.  It gives 
you no intrinsic tools to describe what you do it with.  That makes FORTH 
programs all verbs.  There are few nouns in FORTH.  OO programming is about 
noun verb combinations and seems closer to natural language.

Best,
Kyle