Multicore

At the same time as the security crisis has hit, the multicore crisis has hit. A short while ago we were dealing with isolated serial machines, now our programs must utilize a sea of machines.

Multithreading is hard. Doing it routinely, doing it with large programs, invariably fails.

Intel, NVIDIA, and Google however have this crisis well in hand.

Big Businesses are attacking the problem, with competence and success, and we can leave them to it and not worry too much. Google is pioneering the way, and Intel and NVIDIA are making similar tools available to the masses.

Since massive parallelism is a hard problem, requiring good people, much thought, and much care, the meta solution is to solve that problem as few times as possible, and re-use the resulting solutions as much as possible. If, for example, one uses the hash table provided by Intel’s threaded building blocks library, the Intel library and Intel compiler takes care of hash table related coordination issues that otherwise the programmer would have to take care of, and would probably foul up.

Intel has provided a bunch of utilities that make it a good deal easier, Vtune, thread checker, OpenMP, compiler auto parallelism, Intel Thread Checker, Intel VTune Performance Analyzer, and most importantly, Threaded Building Blocks. it is still hard – but no longer damn near impossible.

Back in the days when there was one hardware thread of execution driving multiple software threads, locking worked well. These day, not so well. Rather, it is often more desirable to use a lockless transactional approach to handle any shared state. Shared state is hard, better to share nothing – or to leave any sharing to those utilities that someone else has already written and debugged. If rolling your own, better to use InterlockedXxx than Lock. Note that you construct your own InterlockedXxx operation for any Xxx using InterlockedCompareExchange.

The big solution, however is that pioneered by Google. Rather than each programmer designing his own multithreading and multicore design, one has a small number of very general massively parallel algorithms embodied in useful software for massaging masses of data. The programmer then calls that software and lets it handle the parallelism. Google’s Map Reduce is the classic example of this, but every database servicing a web application is also an example of this, since one typically has many web servers running many processes all of which might potentially update the same data at the same time, and the database is supposed to sort out any resulting problems, while the developers write in single threaded python or ruby on rails, and let the database handle any problems related to massive parallelism.

Google’s “app engine” allows programmers to write straightforward single threaded python code in the easy to use Django framework that can be executed in a massively parallel manner with coordination between many parallel processes being performed by Google’s datastore.

In short, the multicore crisis, unlike the other crises I describe in this group of web pages, is well in hand.

These documents are licensed under the Creative Commons Attribution-Share Alike 3.0 License