Month: January 2014
Saw this quote at Adela’s school the other day, I can’t agree more.
The only way that we can live is if we grow; the only way we can grow is if we change; the only way we can change is if we learn; the only way we can learn is if we are exposed; the only way we can be exposed is if we throw ourselves out into the open.
By C. Joybell C.
2013 has been my year of new exposure and internal changes. The triggers are seemingly trivial: owning a smartphone, WeChat, high school reunion, my first open source project, starting a daily journal, and a good book.
Getting exposed to new sources of stimulus is often the first step of change.
As someone whose Ph.D work was on the auto parallelization and who spent 5 years in developing the auto-SIMDization feature of the xlc compiler for POWER processors, my view on auto-simdization has turned 180 degrees over the years. Yes, my 5 years on auto-SIMD was one of my most productive years and the best collaboration experience ever. But nowadays, every time someone brought up auto-SIMD as the solution to solving the programmability difficulty of SIMD, I can’t help but shoot back.
Auto-SIMD is the holy grail of SIMD programming models. Everybody wants it: programmers, executives, program managers, academics, compiler designers (myself included). The problem is that the perceived capability of auto-SIMD is quite different from the realistic capability of an auto-SIMD compiler. Putting it bluntly, auto-SIMD compilers rarely work when applied to real codes. Many times, compiler users came to us with a piece of their codes that the compiler cannot parallelize. Sometimes the code is simple to human eyes, but complicated to compilers because of aliasing and unknown side-effects through function calls. Sometimes, the loop is parallel but may be “messed up” by internal compiler transformations that confused the SIMD analysis. This is no fault of the compiler. It is simply the wrong task for the compiler to figure out a parallel loop out of a sequential program. Think about it: how many times do you rely on auto-parallelization to produce a parallel code? Probably none. The same is true for auto-SIMD.
There are times where the compiler indeed can SIMDize a loop, but the loop is often so simple (e.g., matmul w/ all global side-effects known) and the amount of compiler analysis required is so humongous (e.g., inter-procedural analysis) that it is much easier for the programmer to indicate the SIMDizable region to the compiler using some programming interface (e.g., OMP SIMD directives).
I have seen so many times, decision makers embrace SIMD because it sounds like the best solution to solve the problem; compiler practitioners declare victories after SIMDizing a few self-selected kernels and publishing the initial results; and users gave up using the feature after a few frustrated tries.
I never forget the three questions that my boss often asks about a new research idea: 1) does it work? 2) what does it do for me? 3) when is it available? Auto-SIMD fails the very 1st test.
In my community, it is not unusual to see someone that boxes an entire career by one compiler infrastructure. Some of us are labelled by the infrastructure or the language we are working on. I used to be an xlc person and would simply focus on solutions that involve xlc. The mantra used to be: “if not an xlc problem, it’s not my problem”, or “if not a compiler problem, it’s not my problem”. Yes, I was boxed in first by a compiler infrastructure, then by the field of compiler.
The reality is that real problems do not often land squarely into ones’ specialty domain. Not only do the ultimate receivers of the compiler field (application developers) change their programming habits and tools’ requirement all the time, but also for many real problems, compilers may not be the best solution (e.g., sometimes a little change of the algorithm or programming can go a long way and with a much more immediate effect).
This is a reflection on how a compiler person can be trapped by the infrastructures he is associated with and miss out a much larger scope to apply his expertise.
While many of us have started from the “common starting point” box, most of us do not have the luxury of staying in that box for a career (or do you want to if you can?). In this figure, I identified 3 “stretch” roles of a compiler expertise in the system compiler research area that are becoming increasingly important:
- Enable a better hardware design. Increasingly I see compiler people deeply involved in the concept phase of processor design, translating application level requirements to the hardware, proposing new hardware features/instructions, identify workloads, and evaluate performance benefits.
- Programming interface design. This is really a process to extract common components, in terms of a middleware, a common runtime, a domain specific language, or some language features, from users applications to improve productivity and to create a common component for deep and platform-specific optimization (e.g., graph runtime in System G).
- Performance engineering and analysis. This involves workload on-boarding, deep performance analysis (extremely time-consuming and an art in itself), tuning of OS/machine/compiler configurations and code extraction/rewrite.
These 3 roles are a perfect match to a compiler expertise who has the rare understanding of the entire system stack, across application, OS, runtime, compiler/JIT, and hardware. When the importance of optimization in a traditional compiler/JIT is diminishing, we may look more and more to these new areas to expand into.