An Irony in Software Language Design History

Eclipse builds everything in memory, and does so incrementally after the initial build.  It’s one of the reasons Maven CLI builds take so much longer. It’s what makes Eclipse at least potentially a live object environment, which allows you to test a change usually within seconds of making it.  It’s also what makes Eclipse a multi-gigabyte text editor/file browser, if that’s all you’re using it as.  Eclipse’s predecessor (VisualAge for Java) worked the same way, but it was written in Smalltalk, and as a result didn’t use the vast resources sometimes necessary to give Java the dynamism it basically lacks.

I’m currently working on a queryable object graph as an in-memory cache using Connected Data Objects (Eclipse EMF/CDO in Java). It mirrors a schema that, while relational on the back end, is exposed as a giant hierarchical XML schema.  The schema is so big that doing the initial build peaked at just over 11GB heap, with an otherwise empty workspace.

I also got reminded that Java is a stack based language, since I had to split the namespace into multiple namespaces, otherwise the package class that holds the constants that the factory class for each namespace needs to create an object in that namespace blows out the 64k limit on constants, since constants are kept on the stack.  And this is where the irony comes in.

Decades ago when Smalltalk was relatively new the designers took a lot of criticism because it’s not stack based.  It lazily initializes a virtual stack (waits until an OS call is in the next few steps of a code path), and keeps it on the heap until the call is made. Most calls that require the stack are calls to the host OS API’s. (Smalltalk has the low level control of assembler, since it compiles itself directly to assembler JIT). When CPU’s had a 64k stack and maybe a 4k or 8k cache and 512k of heap on a big machine,  stack based languages were almost by definition going to be faster.  Since most languages written since have been written either in C, in order to use OS libs efficiently, or in Java, to use the JVM (itself written primarily in C), nearly all languages written since are also stack based.

With multiple cores, multiple threads per core, etc., and therefore a lot of task switching since languages where you can really code in parallel are either unavailable or incomprehensible to mere mortals, one of the things that hurts performance the most is using the stack, since it has to be copied in and out of the CPU on every task switch.  Smalltalk’s virtual stack gets cached in the 4 or 8 megabyte CPU caches common even on small machines such as low power laptops and tablets.  And since it’s on the heap as well, it doesn’t need to be copied out before a task switch, the cache can just be flushed; of course today 64k of heap is not even noticeable on a phone, As a result many Smalltalk apps vastly outperform Java and even C.  There are multiple other reasons it’s faster, but that one is a major irony, since the designers took so much heat for the design at the time, but today it’s a more optimal design than virtually any language written since.

For anyone interested in the latest, just released open source Smalltalk environment and its power and potential.  Drop by the Pharo site.  Or if you want to see some Smalltalk code in action, and be able to download the environment with the code running in it and play with it, maybe even try modifying some of the code and seeing what happens, check out the Analysis for Business Processes (A4BP) project.



The Open Source Immersive Programming Environment in Smalltalk

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s