Consider a more complex example along the lines of
grep -R -l foo . - "List any files in this directory or any descendent directory that contain the word 'foo'". It gets exceedingly harder to come up with this, but it could be done.
scandir d = opendir(d) >>= stream_readdir >>= map (\f -> case (type f): when file -> open(f) >>= stream_read_unordered(); when dir -> scandir d)
scandir '.'I'm not a Haskell expert so I'm taking some liberties with syntax. I read this as follows: open a directory and stream its contents into a message queue, but the message queue "terminates locally" (does not get sent to another process) by being sent to a function that either opens and streams the file, as before, or recursively starts another directory scan. At a slightly lower level, we might consider creating a local message queue that executes opendir >>= stream_readdir on every message (and all its outputs go to the same message queue), then we seed the queue with an initial '.'. That way there is no recursion, per se, but there is a cycle in our little message queue network. The messages that have no declared destination inside that network are delivered back to the client. In this particular case, we're receiving unordered blocks from files just as before.
By this point we've pretty much passed an entire monadic Haskell program to the kernel and are expecting it to be okay with spawning its own little queueing network on our behalf. Isn't this crazy?
Well, it definitely is, of course, but it also makes some sense.
To start with, it eliminates unnecessary message passing across privilege boundaries. The program, even though it performs IO, is written with pure functions, like Haskell IO is.
The combinators and function bindings easily reduce to a simple queueing network (or can, anyway, supposing we design it carefully), which plays well into our whole asynchronous system call universe that works by passing messages around. Interestingly, this example demonstrates the case where the kernel is executing a syscall for each message in a stream created by the syscall that you actually asked for. The user program doesn't actually know how many syscalls are being executed on his behalf, he just starts getting a barrage of messages and scans them for 'foo' as fast as he can.
Finally, again this carries the data locality advantage. If we have some language that we declare "safe" enough to run "in the kernel"* (even in a controlled eval, which is OK, because the slow stuff is the I/O, and doing a little message passing on the behalf of some interpreted code is relatively fast in comparison), then perhaps we could expect our neighbor, who doesn't necessarily trust us at all but does let us access his data, to run it too (sure, sandbox it, whatever).
This is very important because in a distributed system, the "operating system" very likely does not own the resource we are recursively grep'ing. The message clump can have individual message queues (internal ones like one that kicks off an opendir) and data generating nodes (like the stream_readdir or stream_read_unordered) distributed to different physical resources in the distributed system in accordance to the locality of the data being accessed by each.
So, yes, it is crazy, but if the language is suitably restrictive, we can get a limited form of process migration, and I/O migration is probably the most rewarding subset of them since I/O is so inherently latent, whether considering storage or network I/O.
* Compare what Sun had to do with the
Dtrace language, D. They had to prevent infinite loops, which we potentially could have here. But ultimately the program would have the same behavior itself if not running within the kernel, but here, we might be able to actually examine the situation in this limited universe and do something about it. Further musing:
data/codata analysis with structural recursion and so forth might be able to reason about cycles and declare them safe. Like transactional systems, we might optimistically let them run until we have some way of "suspecting" that they are having trouble, then
then examine the situation (the traditional "hmm, this transaction is taking too long, oh what do you know it is actually deadlocked now that I look at it,
zap" approach).