stalking myspace commentshide myspace contact boxfree dark myspace layoutshacking a myspacesunset tan myspaceaustralian flag myspace layoutmyspace mobsters overdrivemyspace profile snatchermyspace and kelly lisascrollable myspace commentshappy birthday brother myspace commentsmyspace poem codescool myspace tweaks and layoutstomb raider myspace layoutmyspace sunset cliffs layoutjapanese myspace layoutmyspace hide comment boxpeace and love myspace layoutsvalentines layouts myspacemyspace over lapping text generatoremotional myspace layoutsquality myspace layoutshide song myspaceexclusive myspace layoutscop myspace graphicgangsta myspacemyspace comments funny picturesmyspace insultsouthern drive myspacefacebook mary ellen hausejay-z myspace layoutbulletins to post on myspacemyspace comments jailmyspace profile view trackermyspace romantic graphicssites to unblock myspacefacebook hottiemyspace mood codesmyspace paragraph codesneon myspace layoutmicrosoft frontpage to myspacefree myspace layout editormyspace dolphin layoutsdark love myspace layoutsmyspace usa ballroompassword recovery myspacescary graphics for myspacebulletin survey for myspacetim mcgraw myspace layoutspimp myspace pagecustom myspace bannersred bull myspace layoutbackground myspace backgroud xpmyspace hashladybug layouts for myspacemyspace picture creatersdefault myspace layout customdecorating myspace picturesvirtual myspace layoutshiding details section in myspacemyspace countried that i\'ve visitedhide comments code on myspacebackground layouts for myspacemyspace hello kitty graphicshide the details on myspaceamy taylor myspacemyspace slippagebulletin surveys for myspacecancer background myspace layoutsmyspace vampire graphicstrippy myspace backgroundsmyspace tea backgroundalyssa milano myspace pageballerina graphic myspacespiderman 3 myspace layouthappy new year myspace layoutmyspace layout megan foxhappy greetings myspace commentsamerican eagle layouts for myspaceorkut proxy sites facebookmyspace support our troops graphicsmyspace com adult r imyspace proxt sitemyspace samantha richmond kythanksgiving myspace layoutmark harmon myspacemyspace groups layoutsrock graphics for myspacegetting on myspace from schoolrebels man myspace layoutinvisible myspace countersnow myspace layoutmotorcross myspace layoutsfacebook open streamthe spirit myspace layout2.0 layouts for myspacemyspace friendship graphicsengagement graphics for myspacescrubs trivia facebook appfriendster and myspace hot layout

Paul McKenney: Multi-core Linux

Syndicate content
paulmck paulmck 2010-08-09T00:06:04Z
Updated: 20 min 13 sec ago

"The Trouble with Multicore" by David Patterson in July 2010 IEEE Spectrum

Sun, 08/08/2010 - 19:06
Patterson's article is quite interesting, and brings out some good points. I was of course happy that he mentioned Sequent, my old employer, despite the fact that the mention was on a list of “long-gone parallel hopefuls.” An unflattering mention, perhaps, but undeniably true.

However, I was especially happy to see the following sentence:

So rather than working on general programming languages or computer designs, we are instead trying to create a few important applications that can take advantage of many-core microprocessors.

Focusing on parallelization in the large is a great improvement over the traditional academic focus on parallelization in the small. All else being equal, the larger the software artifact, the larger the units of work, and the smaller the fraction of computational resources spent on communication. The less the communication, the better the performance, and usually the greater the scalability. So Patterson's pronouncement is a welcome change, especially given his group's earlier focus on small-scale computational kernels. I hope that the fact that Patterson has now joined the growing group of academics focused on parallelization in the large will encourage other academics to do the same.

Of course, I could raise a number of quibbles with the paper:

  1. The analogy of parallel processing with journalism (last full paragraph of the last column on page 30) misses the mark. Patterson notwithstanding, the fact is that most writers do in fact use parallel processing: there will be a reporter, a copy-editor, and so on. It is in fact quite common for authors of large works to acknowledge those who did research, fact-checking, and other tasks. Of course, to Patterson's point, there must be a limit to the degree of parallelism that can be achieved. But the success of things like Wikipedia indicates that the potential for parallelism is much larger than has been commonly thought.
  2. Patterson argues that desktop applications rarely have sufficient intellectual horsepower behind them to make good use of multicore systems (last sentence of page 31). History has shown, however, that it is not raw intellectual horsepower that is required, but rather experience and proper training.
  3. Patterson also seems to believe that parallel programmers should start small and work their way up to larger systems (last sentence of first paragraph of page 32). Sequent's experience indicates otherwise: by starting off with 30-CPU systems from the get-go, Sequent avoided the typical parallel-programming experience, which is to rewrite the program from scratch multiple times, first to accommodate parallelism at all, next to scale beyond two CPUs, next to get beyond the 16-32-CPU level, and so on. Diving into the deep end of the parallel-programming pool can be quite a bit cheaper and easier than gingerly paddling out from the shallows.
  4. Patterson complains that large systems (128 cores) are not being manufactured, and that software emulation is painfully slow (middle of third column on page 32). Such large systems have in fact been available for quite some time from a number of manufacturers. Of course, they are still quite expensive, which can certainly render them unavailable to most developers. However, there is little need for universities to fabricate them, unless of course they are conducting research on the hardware itself.

Finally, the box on page 31 entitled “Easy as Pi” deserves special attention. In this box, Patterson contrasts a sequential method for calculating the quantity π/4, namely summing the infinite series for the arctangent of one radian, with a parallel Monte Carlo method, which generates pairs of random floating-point numbers between -1 and +1, then counts the fraction that lie within the unit circle.

How good are these algorithms?

Stupid RCU Tricks: Holding Off RCU Read-Side Critical Sections

Wed, 07/14/2010 - 00:34
RCU callbacks are registered via call_rcu(). After an RCU grace period elapses, the callback (which is a C-language function) is invoked. RCU's fundamental guarantee states that once an RCU grace period has elapsed, all RCU read-side critical sections that were executing when the grace period began will have completed. (An RCU read-side critical section is a fragment of code enclosed by rcu_read_lock() and rcu_read_unlock().)

Now, an RCU callback, being a C-language function, has a definite beginning and end. But what about synchronize_rcu(), which blocks until an RCU read-side critical section has elapsed? How does RCU know how long to hold off new RCU read-side critical sections once synchronize_rcu() returns?

Stupid SMP Tricks: Memory Barriers (RWC with rmb)

Thu, 07/01/2010 - 23:22
In the previous posting a code fragment dealt with memory ordering. Strangely enough, the foo_1() function contained only a pair of reads, but separated them with a full smp_mb() barrier. So why not use an smp_rmb() instead? Perhaps something like the following:

int x, y; /* shared variables */
int r1, r2, r3; /* private variables */

void foo_0(void)
{
	ACCESS_ONCE(x) = 1;
}

void foo_1(void)
{
	r1 = x;
	smp_rmb();  /* The only change. */
	r2 = y;
}

void foo_2(void)
{
	y = 1;
	smp_mb();
	r3 = x;
}

After these three functions complete, we have an assertion. Please note that by “complete” I mean that all effects of the functions have become globally visible. One way to ensure this level of completion is for the thread that spawned foo_0(), foo_1(), and foo_2() to do pthread_join() on each of them in turn, and only then execute the following assertion:

assert(!(r1 == 1 && r2 == 0 && r3 == 0));

Can this assertion ever trigger?

Stupid SMP Tricks: Memory Barriers (RWC)

Mon, 06/28/2010 - 22:44
This posting relates to memory barriers in the Linux kernel. We start with smp_mb(), which is a full memory barrier when the kernel is built with CONFIG_SMP=y and is otherwise a compiler barrier that constrains compiler optimizations, but which generates no code.

Consider the following code fragment, where each function foo_n() runs on CPU n, all concurrently:

int x, y; /* shared variables */
int r1, r2, r3; /* semi-private variables */

void foo_0(void)
{
        ACCESS_ONCE(x) = 1;
}

void foo_1(void)
{
        r1 = ACCESS_ONCE(x);
        smp_mb();
        r2 = ACCESS_ONCE(y);
}

void foo_2(void)
{
        ACCESS_ONCE(y) = 1;
        smp_mb();
        r3 = ACCESS_ONCE(x);
}


Now suppose that the following assertion runs after all of the preceding functions complete.

assert(!(r1 == 1 && r2 == 0 && r3 == 0));

Can this assertion ever trigger? Why or why not?

Stupid SMP Tricks: Lockless Access to Structure

Fri, 06/18/2010 - 00:06
Suppose that a pointer is compile-time initialized to reference a static instance of a structure, perhaps as follows:

struct foo {
    int a;
    int b;
};
struct foo static_foo = { 42, 17 };
struct foo *foo_p = &static_foo;

Because this is compile-time initialized, readers should not need use rcu_dereference().

Let's further suppose that at runtime, some CPU, task, or thread might set foo_p to NULL. Because we cannot free the compile-time-allocated static_foo, there is no need for RCU grace periods and RCU read-side critical sections.

These restrictions do simplify things. Readers might be as simple as:

p = foo_p;
if (p != NULL)
    do_something_with(p->a, p->b);


Does this work? Why or why not?

Stupid RCU Tricks: Synchronizing With External State

Sun, 06/06/2010 - 23:00
This puzzle came up at a recent ISO SC22 WG14 standards–committee meeting (for the C language).

Suppose you have an array of three RCU-protected structures. At any given time, one of them is the current structure that will be used by RCU readers. This means that there is a global pointer that will be pointing to one of the elements of this array, thus designating it as the current element. (And yes, a grace period must elapse before a given element is reused.)

But suppose that there is another global integer whose value must be kept consistent with the contents of the current element — to keep things trivial, let's assume that the value of this global integer must be twice that of an integer within the current structure.

The data structures might be set up as follows:

struct rcu_protected {
  int a;
};
struct rcu_protected elements[3];
struct rcu_protected *current = &elements[0];
int consistent;  /* must be 2 * current->a */

How can this be accomplished?

Confessions of a Recovering Proprietary Programmer, Part VI

Tue, 05/04/2010 - 11:00
I have been a happy vi user for almost three decades. My initial choice of vi over emacs was quite simple: the machine that was available at the time could run about ten vi sessions, but only one emacs session. Given that this was a shared machine, use of emacs was therefore socially irresponsible. Given a more capable machine, perhaps I would instead be a happy emacs user. But by now, vi is ingrained firmly in my fingers. Or rather, those features of vi that have been around for some decades are ingrained — I still haven't figured out what the g command does, aside from trip me up when I don't quite hit the shift key firmly enough. I am sure that there is some comprehensive vi documentation out there somewhere, but when I search for it, I only find web pages that helpfully offer tutorials on vi features that I have known about for more than a quarter century.

As a result, these days I mostly learn vi by making mistakes. For example, yesterday I downloaded an ASCII-text web page and edited it with vi. The startup message from vi looked very strange, but life was moving especially quickly that morning, and vi did show me the desired page, so I moved on.

This morning, I saw the same strange startup message. I got done with my vi session, and looked at the file I had downloaded to see if there was anything strange about its permissions or content. There was in fact something very strange, namely that it didn't exist at all.

The mistake I had made was to give the URL to vi instead of to wget. Nevertheless, vi happily downloaded the web page to /tmp and let me edit it. So, to see one of my recent patches, instead of using a browser, I can type:

vi http://www.rdrop.com/users/paulmk/patches/2.6.34-rc3-rcu-5.304d8da6.patch

This can be quite nice when testing scripts that automatically download information from the web!

Transactional Memory Everywhere: Follow-Up I

Tue, 04/20/2010 - 23:33
One of the big challenges for any synchronization mechanism is synchronization overhead. The larger the synchronization overhead, the greater the granularity of parallelism required to attain high efficiency. Given that software transactional memory has significant synchronization overhead, one natural reaction might be to increase the granularity in order to increase efficiency.

Unfortunately, significant granularity on many applications means that transactions start encompassing system calls, which poses many difficulties. So it was with some interest that I read a paper by Don Porter and others. They propose enlisting the kernel's aid in managing large transactions that span many system calls, with a focus on file I/O. The resulting patch to the Linux kernel is not small, and it does not address things like networking. Unusually, they gain substantial speedups on some workloads, though cynics might wonder if this is more an indictment of the ext2 and ext3 filesystems than a testament to their approach's strength. The ability to roll back a number of system calls is interesting, and might be quite attractive to some people (although I doubt that I would have the foresight to start a transaction immediately before making a mistake).

Nevertheless, it is an interesting approach. I have no idea whether it will pan out, but it does at least show some potential.