[self-interest] Self CPU usage and file handling

jas jas at cruzio.com
Wed Apr 23 00:47:59 UTC 2014


Hi Russell,

Just guessing, here, as I don't have anything to look at
other than this post - but the symptoms you describe fit the pattern.

My guess is that someone changed a contract without noticing it.
As you suggest here:

	Note that os_file's setAsync already does the ownership handling,
	so we shouldn't be redoing it each time anyway
	(Is this a sign that there were previously problems?)

The original contract was 'someone has to set the owner'.

	This is attempting to set the VM as the process
	that will receive SIGIO and SIGURG signals
	for events on the file in question.

It was probably changed (while trying to fix another interrupt related bug)
in the following way.

     1. Someone has to set the owner.
     2. It almost always works, just this one case doesn't.
     3. Aha - setAsync is assuming the owner will be set somewhere else.
.   4. If it is almost always set somewhere else, we'd get what we have 
here.
     5. Let's set it right away, in setAsync.

Probably didn't solve the problem, but, having already reasoned
that it *could* have solved the problem, and noticing that it *appears*
not to make things worse, that 'fix' stayed in.

I'll skip the rest of what I think might be going on,
because it doesn't matter, yet.

I suggest taking that bit out of setAsync (i.e. do not set the owner)
and see what happens..

If my guess is right, the CPU will no longer be pegged.
If my guess was not right, revert the code back to what it was - no harm 
done.

-Jim

.



On 4/22/2014 2:07 AM, Russell Allen wrote:
> Hi all,
>
> Forgive the longish email.
>
> I have been looking into Self CPU usage as a side effect of looking into its file handling. Separely, Ben Noordhuis on GitHub has opened an issue also about CPU usage, but a different issue.
>
> Both of these issues cause the standard Self image running morphic to use too much CPU. The first issue is masked by a 'fix' which I did, which on reflection turns out to be entirely misconceived. The second issue is with the interaction between the VM level repl and the scheduler.
>
> I'm talking about the OS X build here. I haven't checked this out yet with the Linux build.
>
> FIRSTLY
>
> Deep in the heart of the file handling code lurks the method "unixGlobals os while_EINTR_do: block IfFail: errBlk"
>
> In the 4.5 build this looks like this:
>
> unixGlobals os while_EINTR_do: block IfFail: errBlk = ([
>      [ | :exit_inner_and_retry |
>          process this sleep: 1.
>          ^ callBlock value: [ | :error |
>               error = 'EINTR' ifTrue: exit_inner_and_retry.
>               errBlk value: error.
>          ].
>      ] exit.
>      scheduler isRunning ifTrue: [ process this yield ].
> ] loop )
>
> If you remove the "process this sleep: 1", you will notice Self peg the CPU at 100% (or as high as the OS will allow).  This was why I initially put that sleep in. This was wrong: Mea culpa. It fixed the symptoms, but not the problem.
>
> The problem is *not* that too many EINTR signals being received. The problem is that when reading or writing files, the reading/writing loop calls "suspendIfAsync", which is (for OS X):
>
> traits unixFile osVarients bsd suspendIfAsync = (
>      setOwnerIfFail: [process this yield. ^ self].
>      suspendForIO)
>
> following down:
>
> traits unixFile osVarients bsd setOwnerIfFail: fb = ( fcntl: fcntls f_setown With: os getpid IfFail: fb)
>
> This is attempting to set the VM as the process that will receive SIGIO and SIGURG signals for events on the file in question.
>
> On the Mac this always fails with ENOTTY for the standard streams (stdin, stdout, stderr), so the suspend never happens. Instead the stream read or written busy locks, pegging the CPU. This includes the prompt reading from stdin.
>
> The Solaris version is: suspendIfAsync = ( scheduler isRunning ifTrue: [suspendForIO] )
>
> Changing the OS X version to be the same as the solaris version *seems* to work. Note that os_file's setAsync already does the ownership handling, so we shouldn't be redoing it each time anyway (Is this a sign that there were previously problems?)
>
> But I don't yet understand why the setOwnerIfFail: is failing...
>
> SECONDLY
>
> Ben Noordhuis has opened Issue #34 on GitHub: https://github.com/russellallen/self/issues/34
>
> He says:
>
> To reproduce: look at top(1) when ./vm/Self is running in another terminal. It always hovers at 5-10% CPU usage on my machine.
>
> Here is what I think happens: The SIGALRM signal handler that's installed by what I suspect is the scheduler (profiler?) seems to run at a 100 Hz frequency. When you start a shell, all those signals cause the fread() that is called by InteractiveScanner::read_next_char() to keep returning from and re-entering the read() system call.
>
> My comment:
>
> Changing "scheduler setRealTimer: ms" to either not set SIGALRM, or set SIGALRM no more frequently than at 1 second intervals, does drop the CPU usage down to about 1% CPU on my machine. The timer is not being set at 10ms intervals all the time - it depends on usage - but this seems a commonly requested interval.
>
> The reading code is in scanner.cpp and is:
>
> fint InteractiveScanner::read_next_char() {
>    char c;
>    while (true) {
>      if (fread(&c, sizeof(char), 1, stdin)) return c;
>      if (feof(stdin)) return EOF;
>    }
> }
>
> CONCLUSION
>
> Help! No seriously, any comments, suggestions?
> Is your setup giving you completely different answers/results?
> Is Linux the same?
>
> Cheers, Russell
>
> ------------------------------------
>
> Yahoo Groups Links
>
>
>
>




More information about the Self-interest mailing list