What compiler? It’s all about abstraction.

Posted on

With the maturity of general-purpose programming languages (e.g., C/C++, Java) and the availability of open-source compilers (e.g., gcc and LLVM), the traditional role of a compiler expert is becoming increasingly narrower and incremental today.

Are we (the “compiler experts”) working ourselves out of relevance? Yes and no. Yes, because hard-core optimizing compiler work is becoming increasingly rare nowadays at academic conferences (the frontier of what are considered as hard problems by the field ). No, because if we take a fresh look at our field with a more open mind, we’ll discover a much bigger space of productivity and performance optimization roles that are well suited to our expertise.


punch cardLet’s first step back and look at the origin of compilation. The figure to the left is how programming was like more than half a century ago. At the time, programming is done directly with punching binary instructions to a card. Out of the pain of punch card programming, the need of layers of abstractions arises. So please remember the origin of compilation as really the by-product of computing abstractions.







As shown in the picture below, there have been three levels of abstractions when mapping algorithms/applications all the way down to physical hardware.

  • Hardware-centric abstraction. The assembly language emerges first to provide abstractions of the hardware, e.g., mnemonics and symbol resolution.
  • Programming-centric abstraction. High-level programming languages (FORTRAN, C/C++, Java) emerge next to provide abstractions of programming or computing such as structured control-flow (as opposed to branch instructions in assembly), modules such as function calls and object-oriented programming, and abstractions of data (e.g., type system, scope).
  • Application-/domain-centric abstraction. The more recent trends focus on abstractions of application domains. Such abstractions often take the form of a programming framework (e.g., map-reduce for big data domain, Django/ruby-on-rails for web servers). Sometimes the abstraction is in the form of domain-specific language (e.g., R for data analysis or array languages for dense matrix computation).

Let us not forget that the fact that compilers/assembler/framework/runtime are simply byproducts of abstractions. So abstractions come first!

programming model pie


The past 50 years have seen the rapid advances and maturity in the 1st-level (hardware-specific, especially for general-purpose, homogeneous systems) and the 2nd-level (programming-specific, especially for statically typed languages) abstractions. A few remaining challenges in the 1st- and 2nd- abstraction levels include:

  • Abstraction for systems of hardware (especially heterogeneous system) and its implementations. Some of our current way of programming heterogeneous systems is not much better than the “punch card” programming model of the past. Sometimes the abstractions are defined but the implementations are nowhere near satisfactory.
  • Implementation of dynamic scripting languages. The abstraction is well developed but the implementation (in terms of performance) is still lagging. This is still a budding field with a bright spot in Javascript JIT.

Most important of all, I believe this is the golden age of domain-specific abstractions (the 3rd level of abstractions). Take big data as an example, the last decade has seen its programming model evolve from “punch card” programming for distributed systems to a rich space of big data programming models (e.g., graph runtime, map-reduce, key-value-store, and other domain-specific languages).


The biggest challenge of application-specific abstractions is first and foremost the abstraction itself. The problem is often manifested as a vague problem statement such as “the development cost of my software is too large” or “the software has grown into a beast, hard to maintain and extend” or the perennial complains about “application is too slow”. The root cause of a lot of these problems is poor software architecture, and can be cured by providing layers of abstractions in the software architecture.

This is what our team (SDK & compiler team) is doing w/ various internal product codes. There is no universal rule of how an application-specific programming model looks like and/or how to derive it. One has to look at the design document and the implementation of the current software over and over again to come out w/ a proper abstraction. That is the biggest challenge of this work. Once the abstraction is defined, implementation (compiler or not) will naturally follow.

When you have a software productivity or performance problem, remember the first picture shown above and ask yourself, “am I doing punch-card programming for my domain/application”? If the answer is yes, think about abstraction first!



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s