Month: April 2014
With the maturity of general-purpose programming languages (e.g., C/C++, Java) and the availability of open-source compilers (e.g., gcc and LLVM), the traditional role of a compiler expert is becoming increasingly narrower and incremental today.
Are we (the “compiler experts”) working ourselves out of relevance? Yes and no. Yes, because hard-core optimizing compiler work is becoming increasingly rare nowadays at academic conferences (the frontier of what are considered as hard problems by the field ). No, because if we take a fresh look at our field with a more open mind, we’ll discover a much bigger space of productivity and performance optimization roles that are well suited to our expertise.
COMPILATION, A BY-PRODUCT OF COMPUTING ABSTRACTION
Let’s first step back and look at the origin of compilation. The figure to the left is how programming was like more than half a century ago. At the time, programming is done directly with punching binary instructions to a card. Out of the pain of punch card programming, the need of layers of abstractions arises. So please remember the origin of compilation as really the by-product of computing abstractions.
As shown in the picture below, there have been three levels of abstractions when mapping algorithms/applications all the way down to physical hardware.
- Hardware-centric abstraction. The assembly language emerges first to provide abstractions of the hardware, e.g., mnemonics and symbol resolution.
- Programming-centric abstraction. High-level programming languages (FORTRAN, C/C++, Java) emerge next to provide abstractions of programming or computing such as structured control-flow (as opposed to branch instructions in assembly), modules such as function calls and object-oriented programming, and abstractions of data (e.g., type system, scope).
- Application-/domain-centric abstraction. The more recent trends focus on abstractions of application domains. Such abstractions often take the form of a programming framework (e.g., map-reduce for big data domain, Django/ruby-on-rails for web servers). Sometimes the abstraction is in the form of domain-specific language (e.g., R for data analysis or array languages for dense matrix computation).
Let us not forget that the fact that compilers/assembler/framework/runtime are simply byproducts of abstractions. So abstractions come first!
THE GOLDEN AGE OF DOMAIN-SPECIFIC ABSTRACTIONS
The past 50 years have seen the rapid advances and maturity in the 1st-level (hardware-specific, especially for general-purpose, homogeneous systems) and the 2nd-level (programming-specific, especially for statically typed languages) abstractions. A few remaining challenges in the 1st- and 2nd- abstraction levels include:
- Abstraction for systems of hardware (especially heterogeneous system) and its implementations. Some of our current way of programming heterogeneous systems is not much better than the “punch card” programming model of the past. Sometimes the abstractions are defined but the implementations are nowhere near satisfactory.
Most important of all, I believe this is the golden age of domain-specific abstractions (the 3rd level of abstractions). Take big data as an example, the last decade has seen its programming model evolve from “punch card” programming for distributed systems to a rich space of big data programming models (e.g., graph runtime, map-reduce, key-value-store, and other domain-specific languages).
ABSTRACTION FIRST, IMPLEMENTATION FOLLOWS
The biggest challenge of application-specific abstractions is first and foremost the abstraction itself. The problem is often manifested as a vague problem statement such as “the development cost of my software is too large” or “the software has grown into a beast, hard to maintain and extend” or the perennial complains about “application is too slow”. The root cause of a lot of these problems is poor software architecture, and can be cured by providing layers of abstractions in the software architecture.
This is what our team (SDK & compiler team) is doing w/ various internal product codes. There is no universal rule of how an application-specific programming model looks like and/or how to derive it. One has to look at the design document and the implementation of the current software over and over again to come out w/ a proper abstraction. That is the biggest challenge of this work. Once the abstraction is defined, implementation (compiler or not) will naturally follow.
When you have a software productivity or performance problem, remember the first picture shown above and ask yourself, “am I doing punch-card programming for my domain/application”? If the answer is yes, think about abstraction first!