When I saw this article in my inbox, I knew I shouldn't bother reading it. I really couldn't help myself though. I'm weak for gossip and my flight was delayed so boredom got the best of me.
I can't blame the tech media for the wild reporting though. The situation surrounding KVM, Xen, and Linux virtualization is pretty confused right now. I'll attempt to do my best to clear things up. I'll make an extra disclaimer though that this is purely my own opinions and does not represent any official position of my employer.
I'm think we can finally admit that we, the Linux community, made a very big mistake with Xen. Xen should have never been included in a Linux distribution. There, I've said it. We've all been thinking it, have whispered it in closed rooms, and have done our bests to avoid it.
I say this, not because Xen isn't useful technology and certainly not because people shouldn't use it. Xen is a very useful project and can really make a huge impact in an enterprise environment. Quite simply, Xen is not, and will never be, a part of Linux. Therefore, including it in a Linux distribution has only led to massive user confusion about the relationship between Linux and Xen.
Xen is a hypervisor that is based on the Nemesis microkernel. Linux distributions ship Xen today and by default install a Linux guest (known as domain-0) and do their best to hide the fact that Xen is not a part of Linux. They've done a good job, most users won't even notice that they are running an entirely different Operating System. The whole situation is somewhat absurd though. It's like if the distributions shipped a NetBSD kernel automatically and switched to using it when you wanted to run a LAMP stack. We don't ship a plethora of purpose-built kernels in a distribution. We ship one kernel and make sure that it works well for all users. That's what makes a Linux distribution Linux. When you take away the Linux kernel, it's not Linux any more.
There is no shortage of purpose-built kernels out there. NetBSD is a purpose-built kernel for networking workloads. QNX is a purpose-built kernel for embedded environments. VxWorks is a purpose-built kernel for real-time environments. Being purpose-built doesn't imply superiority and Linux currently is very competitive in all of these areas.
When the distros first shipped Xen, it was done mostly out of desperation. Virtualization was, and still is, the "hot" thing. Linux did not provide any native hypervisor capability. Most Linux developers didn't even really know that much about virtualization. Xen was a pretty easy to use purpose-built kernel that had a pretty good community. So we made the hasty decision to ship Xen instead of investing in making Linux a proper hypervisor.
This decision has come back to haunt us now in the form of massive confusion. When people talk about Xen not being merged into Linux, I don't think they realize that Xen will *never* be merged into Linux. Xen will always be a separate, purpose-built kernel. There are patches to Linux that enable it to run well as a guest under Xen. These patches are likely to be merged in the future, but Xen will never been a part of the Linux kernel.
As a Linux developer, it's hard for me to be that interested in Xen--for the same reasons I have no interest in NetBSD, QNX, or VxWorks. The same is true for the vast majority of Linux developers. When you think about it, it is really quite silly. We advocate Linux for everything from embedded systems, to systems requiring real-time performances, to high-end mainframes. I trust Linux to run on my dvd player, my laptop, and to run on the servers that manage my 401k. Is virtualization so much harder than every other problem in the industry that Linux is somehow incompatible of doing it well on its own? Of course not. Virtualization is actually quite simple compared to things like real-time.
This does not mean that Xen is dead or that we should have never encouraged people to use it in the first place. At the time, it was the best solution available. At this moment in time, it's still unclear whether Linux as hypervisor is better than Xen in every scenario. I won't say that all users should switch en-masse from Xen to Linux for their virtualization needs. All of the projects I've referenced here are viable projects that have large user bases.
I'm a Linux developer though, and just as others Linux hackers who are trying to make Linux run well in everything from mainframes to dvd players, I will continue to work to make Linux work well as a hypervisor. The Linux community will work toward making Linux the best hypervisor out there. The Linux distros will stop shipping a purpose-built kernel for virtualization and instead rely on Linux for it.
Looking at the rest of the industry, I'm surprised that other kernels haven't gone in the direction of Linux in terms of adding hypervisor support directly to the kernel.
Why is Windows not good enough to act a hypervisor such that Microsoft had to write a new kernel from scratch (Hyper-V)?
Why is Solaris not good enough to act as a hypervisor requiring Sun to ship Xen in xVM? Solaris is good enough to run enterprise workloads but not good enough to run a Windows VM? Really? Maybe :-)
Forget about all of the "true hypervisor" FUD you may read. The real question to ask yourself is what is so wrong with these other kernels that they aren't capable of running virtual machines well and instead have to rely on a relatively young and untested microkernel to do their heavy lifting?
Update: modified some of the text for clarity. Flight delayed more so another round of editing :-)
During the holiday we had here in Brazil I've spent some time working on integrating my autotest control files with our trac testcase management plugin. For those that don't know, we hacked the Trac testcase management plugin [1] to suit our needs, and developed some macros that generate proper test bucket [2] state reports. The plugin can generate the test buckets and neat tables, summary tables and plugins. Every test is mapped to a trac ticket.
Now, if we could integrate our autotest control files to automagically update the test tickets on the event of a pass of failure, that'd be great, isn't it? Yep. And that's possible since trac has an xmlrpc server. Python, with its 'batteries included' philosophy includes the xmlrpclib module, that allows us to interact with an xmlrpc server. First, you need to install the Trac XMLRPC plugin, and the plugin page is an excellent source on how to install the plugin [3].
After the trac instance is restarted, test if the plugin is working by doing some tests using the interactive python prompt (I definitely recommend using the ipython python shell). Let's suppose that the trac instance is running on http://mydomain.foo.com/trac/mytracinstance. The xmlrpc server URL that you'll have to build looks like:
http://trac_user:trac_password@mydomain.foo.com/trac/mytracinstance
import xmlrpclib trac_user = "trac_user" trac_password = "trac_password" trac_rpc_root = "mydomain.foo.com/trac/mytracinstance/login/xmlrpc" trac_url = 'http://%s:%s@%s' % (trac_user, trac_password, trac_rpc_root) trac = xmlrpclib.ServerProxy(trac_url)
At this point we already have a server connection object stablished on trac, so we just need to call remote methods as methods of this connection object. You can get a complete list of methods accessible from this server connection looking at the XMLRPC plugin page. To make sure this connection is working you can call the method ticket.show(), that requires the ticket number. So, assuming that you've provided a valid ticket number to it, you'll get the fields of your ticket on a list that contains a dictionary with the ticket values, like:
In [12]: trac.ticket.get(100)
Out[12]:
[100,
1185952767,
1186675548,
{'cc': '',
'component': 'Distro',
'description': "''''''Test Details:''''''\n\nA highly configurable stress test utility that calls most of the main file system syscalls. Originally developed by SGI for XFS testing.\n\n''''''Expected result:'''''' \n\n0 as return code",
'keywords': '',
'machine_name': 'lpar_name',
'machine_type': 'HV8',
'milestone': 'DummyDistro',
'owner': 'lucasmr@br.ibm.com',
'priority': 'major',
'reporter': 'lucasmr@br.ibm.com',
'resolution': 'fixed',
'status': 'closed',
'summary': 'TestID: fsstress -- HV8',
'testcase_bugzilla': '',
'testcase_result': 'Pass',
'type': 'testcase'}]
If calling this method works as described, we should be good to go. As one may guess, the method that updates the tickets is ticket.update, as follows:
array ticket.update(int id, string comment, struct attributes={}, boolean notify=False)
Update a ticket, returning the new ticket in the same form as getTicket().
So you have to pass the ticket id, a string comment (that might for example contain an URL to the test logs), a dictionary with the new ticket fields (for the fields that will be updated). Now, to the autotest control file.
An autotest control file contains fragments of python code, but it can grow to something quite elaborate. The main interface from running tests on a control file is the method job.run_test, a simple autotest control file would be something like:
job.run_test('sleeptest')
The job.run_test() method takes a 'test name' as its argument, and returns True if the test passes, False when it fails, so the logic to implement trac reporting is very straightforward:
ticket = 150
message = "[http://myhost.foo.com/results/sleeptest Sleeptest log]"
if job.run_test('sleeptest'):
trac.ticket.update(ticket, message, \
{'status': 'closed', 'resolution': 'fixed', \
'testcase_result': 'Pass'})
else:
trac.ticket.update(ticket, message, \
{'status': '', 'resolution': '', \
'testcase_result': 'Fail'})
This way, when the test passes, it will close it and mark it as pass with no intervention, when it fails it only marks it as failed, leaving it open so the test engineer can analyze the failure and came up with a conclusion/bug report.
It was a fun holiday indeed :)
[1] We plan on integrating those modifications to the plugin, of course.
[2] By test bucket we mean the sequence of testcases that have to be performed for a single project, let's say, distro testing.
[3] It's packaged as a python egg, that could be installed using easy_install or unpacking it and dropping it to your python site_packages directory
Now, as a performance analyst, I'm often asked: "Is it a performance play?" My quick answer: "Nah - not usually..." But for everything I've tried: "It just works" - which in and of itself is pretty cool. You really want the best performance for your app? Re-compile and run it natively. Duh. You want easy access to existing x86 compiled apps? Give this product a shot. And in some cases, the performance of the translated product is just fine for the user's needs.
In essence, this product is the flip-side of Transitive's translator technology for Apple which translates older Apple Power applications to run on the new x86-based Apple systems. Check out these web sites if you missed the technology introduction several years ago:
IBM and Transitive (http://transitive.com/customers/ibm) have already introduced the second release (Ver 1.2) of the IBM PowerVM Lx86 product.
Originally discussed in the press as p-AVE (for example, see an article from http://www.it-analysis.com/), IBM's product naming wizards must have been at work with the preliminary name of IBM System p Application Virtual Environment (System p AVE). Later they followed it with a newer official name under the IBM PowerVM umbrella as PowerVM Lx86 for x86 Linux applications. "p-AVE" certainly rolled off the tongue far easier than the PowerVM Lx86 name. But the PowerVM naming admit'ably fits better with the overall virtualization strengths of the Power line.
For a page full of pointers and interesting helpful hints, check out http://www-128.ibm.com/developerworks/linux/lx86/.
For a clever approach to using PowerVM Lx86, a nice demo was created which you can see on YouTube.
Another example of common product usage is in the world of graphing performance results. Users can check out a really nice set of charting libraries from Advanced Software Engineering (http://www.advsofteng.com/) available with the ChartDirector product. The executable run-time libraries are available for a variety of platforms, including Linux on i386, but alas, not for Linux on ppc64 systems. But when the i386 libraries are installed on a Power system running Linux with the additional PowerVM Lx86 product, Power users can use the graphing routines directly. Again, the perceptible performance differences are minimal, and the full function of the i386 routines are available to the Power users.
The IBM web site for PowerVM Virtualization Software offerings has a good description of the capabilities of the Linux product and the services available for software vendors to enable their apps for native execution while still exploiting the Power systems running Linux with their existing applications.
Keep in mind there are the normal obligatory footnotes and qualifications on what i386 applications can function under this product - check out the product web sites for that information.
Finally, as a performance team, we always tend to agonize over the corner cases which highlight the performance challenges of translating an application from one system platform base to another, and there certainly are some areas where performance can be a challenge. Java is a good example. There are too many steps of translating byte codes to executables, then those executables are translated again to execute on the Power platform, which can make for a rather poor execution path. If your Java app is a minor piece of a bigger application (the prime example is as an application installer), shrug. But whew, if you're thinking about snagging a full comprehensive Java based product and running it in in translation mode - as opposed to verifying that the Java code runs on the Power platform - I can anticipate you may be disappointed with the performance. One would've hoped that the Java world of write once, run anywhere would've panned out better than the write once test everywhere implementation.
In the meantime, if you need easy access to x86 executables and applications on your Power systems, give this a shot.
In the last article we finished with a SPE-based fractal renderer, but with a limited maximum fractal size of 64 × 64 pixels:
We'd like to generate full-size fractals, but the DMAs (which we use to transfer the fractal image out of the SPE) have a maximum size of 64kB. The solution is to perform multiple DMAs each containing a subset of the image's rows.
Each invocation of render_fractal() should render a DMA-able chunk of fractal data, then we perform the DMA. We do this until the SPE has processed the entire image:
We just need to modify the spe-fractal code (spe-fractal.c) a little. At present, we just render the whole fractal in one pass and DMA the data in the main() function:
render_fractal(&args.fractal);
mfc_put(args.fractal.imgbuf, ppe_buf,
args.fractal.rows * args.fractal.cols * sizeof(struct pixel),
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
First, we need to modify our render_fractal() fuction to take a starting row, and a number of rows to render. This is the new prototype of render_fractal():
static void render_fractal(struct fractal_params *params,
int start_row, int n_rows)
In the SPE program's main(), we just need to set up some convenience variables:
bytes_per_row = sizeof(*buf) * args.fractal.cols;
rows_per_dma = sizeof(buf) / bytes_per_row;
And do the rendering and DMAs in a loop:
for (row = 0; row < args.fractal.rows; row += rows_per_dma) {
render_fractal(&args.fractal, row, rows_per_dma);
mfc_put(buf, ppe_buf + row * bytes_per_dma,
rows_per_dma * bytes_per_row,
0, 0, 0);
/* Wait for the DMA to complete */
mfc_write_tag_mask(1 << 0);
mfc_read_tag_status_all();
}
This loop will render as many image rows as will fit into a single DMA, then DMA the rendered data back to main memory.
Now, we're able to render fractals larger than 64 × 64 pixels:
The source for the updated fractal renderer is available in .
Now that we can generate full-size fractals, we can compare the running times with the PPE-based fractal renderer. The following table shows running times with a standard fractal (using these fractal parameters).
Running times of fractal renderer ImplementationTime (sec) PPE55.7 1 SPE40.7So, we get a 27% speedup by moving the fractal generation code to run on a SPE. We're still a way behind the optimal performance though, and benchmarking on other systems gives better times (for example, generating the same fractal on an Intel Core 2 Duo @ 2.4GHz takes 13.8 seconds).
We can improve the Cell performance by a large amount - stay tuned for the next article to see how.
Despite being a KDE person, I use Evolution as my mail client, including for reading mailing lists and posting patches. Up until now I’ve been doing the latter by attaching patches instead of including them in the message body, to avoid whitespace mangling and linewrap.
But this method is inconvenient sometimes: when you want to comment on a patch posted on the list, portions of that patch (which may include long lines that shouldn’t wrap) will show up in the message body and you can have trouble. This is a problem especially when the poster also sent their patch as attachment, because you will most likely need to copy & paste the patch into your reply window and things will get mangled right there.
Turns out that there is a way to include a patch in the message body in Evolution’s compose window and it will get through unmaimed: select the Preformat option in the paragraph style dropdown, and then paste in your patch or use the Insert -> Text File… option.
Caveat: Older versions of Evolution converted tabs to spaces when pasting text, so you had to insert it from a text file to preserve whitespace. I just tested with version 2.12.3 from Debian (package version 2.12.3-1), and pasting a patch containing tabs worked fine so the bug has been fixed.
Thanks Klaus Kiwi for the tip!
A noticed a very interesting developerWorks article today on the ext4 filesystem.
http://www.ibm.com/developerworks/linux/library/l-ext4/index.html?ca=drs-
There’s a great summary table of the new features. Remember though, ext4 is still labeled experimental.
Ext4 is the latest in a long line of Linux® file systems, and it’s likely to be as important and popular as its predecessors. As a Linux system administrator, you should be aware of the advantages, disadvantages, and basic steps for migrating to ext4. This article explains when to adopt ext4, how to adapt traditional file system maintenance tool usage to ext4, and how to get the most out of the file system.
As I mentioned before, I’ve been working on and off on adding Python scripting support to GDB, with Tom Tromey and Vladimir Prus. We did the work in a git repository, separate from the GDB main repo (which still uses CVS, by the way). Now came the time to get the work we did there, separate it in patches and post them for review on the gdb-patches mailing list.
This is the tale of my patch-producing efforts. I’m sorry, it is most likely boring for everyone but me. Still, I wanted to write it down so you are free to stop reading the post here.
Anyway, back to where I was: I (foolishly, perhaps?) volunteered to create the patches. One reason for me to do that is that I actually enjoy working with and learning about source code management and related issues, and this was a good opportunity for me to improve my git-fu (since I thought git could help me do the job in a sane way, which fortunately proved to be the case).
I say foolishly because I thought it wouldn’t take too much time to cut out the patches, since I knew what I would have to do… The task of course took longer than I expected, in part because of unforeseen autotools woes, but of course also because I underestimated the effort (I am an optimist).
Here is the method I used:
First of all, I wanted to update the code to the latest CVS version of GDB, because the work in the git repo was done based on a CVS version from February… Nothing to see here, actually. I just created a new branch with the latest CVS update, and rebased the commits on top of that. In hindsight, I should have merged the new CVS update into the python branch, which would have made me deal with less conflicts. I used rebase because I thought I would cherry pick the commits later, so I wanted each of them refreshed.
Then the real fun began. I started using interactive mode of git rebase to squash related commits together (to form the patches), and reorder them. I quickly realised that with this approach I would have a bit of difficulty with commits which touched different areas of the code and crossed the borders I had in my mind for the patches I wanted to generate. I would have to first split those into smaller, more behaving commits and only then squash them with other similar changes. That seemed to be more work than really necessary.
Also, the older commits did things in ways and places which were later changed, and it looked like I would have some trouble reconciling older and newer code to fit in one patch (maybe not though, maybe that would be taken care of more or less naturally). Also, I would need to take some time to familiarise myself with the commits in the branch, because I only authored some of them. The “shuffle commits around” approach wasn’t looking very promising.
I then turned to a different strategy: I generated a big patch containing all of the code in the branch, and applied it (using the plain old patch command) on top of a clean branch which contained only the CVS HEAD version I was using as a base. All I needed to do now was to selectively add to the index the changes that I wanted to include in a patch and then commit those changes together. And repeat the process for the next patch and so on.
It proved to be a good approach, especially because of the interactive mode of git-add. This mode asks you about each change inside a modified file, letting you add that change to the index or skip it, leaving it in the working directory for a future commit. git add -i streamlined the “change picking” process quite a lot, and made the patch-cutting almost mechanical. (By the way, this feature is also available in Mercurial, with the Record extension. I even believe (not sure though) that the Mercurial extension predates git add -i) (I don’t know if Bazaar has it, would be nice to know).
In this phase of the process, git rebase -i was useful. Sometimes I came accross a change in the working directory which would fit better in a patch which I had already committed. It was simply a matter of committing that change and then shuflling it back and squashing with the proper patch.
At each commit I pushed the changes to a pristine repo which I used to build GDB and guarantee that each patch included all the changes it needed.
Voilà, at the end of the process I had a git branch where each commit corresponded to one patch which I wanted to send to the mailing list.
On our daily work, we have to deal with a quite large number of test systems (usually spread across the world), and perform some operations on these test systems. Even though we automate the testcase execution, it's useful to have ssh/console access to a machine when you need it nevertheless [1].
After some time having to look on wiki pages the access data for my systems (hostname, passwords, users), I figured I could speed up my work by defining shell aliases to the machines I commonly access at work. Reading a bit more about the subject, it became clear to me that the use of shell functions would be a better solution, since you can do way more stuff with them.
The approach I chose was to write a shell script kept under my source code folder, and then source it on your ~/.bashrc file as follows:
source /path/to/code/test-machines.sh
So that when you start a new shell instance, the functions defined on this script get loaded.
Another aspect of software testing is that our test systems need to be constantly re-installed over and over, therefore generating a new ssh fingerprint every time you do the re-install (and making your ssh to complain loudly when you try to access those machines). In order to work around this, you can call ssh with appropriate options so that it doesn't cache the fingerprints.
SSH_TEST='/usr/bin/ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
StrictHostKeyChecking=no means that ssh won't ask you if you want to accept the new fingerprint, it will just accept it. Also, it will sent the fingerprint it just got from the machine re-installed and send it to the vast fields of /dev/null land :) so you won't get complaints from ssh every time your test system gets re-installed [2].
This way, we can create a base test_system() function that will be used by the actual machine access functions, like:
test_system() {
local host=$1
local profile=$2
local user='user'
local pass='password'
echo
echo "Client partition information"
echo "HMC LPAR profile: $profile"
echo "Hostname: $host"
echo "User: $user"
echo "Password: .pasroot"
echo "ssh $user@$host"
echo
$SSH_TEST $user@$host
}
Notice here that since I do test on power platforms, I put information about lpar profiles. You can customize this base class to better suit your needs, obviously. The idea is to show information about your test systems in a handy way (ie, you can cut and paste info directly from your terminal emulator, not having to look somewhere else), as well as executing the ssh access function at the same time. With a base class like that, you can choose a short name for your system, define some parameters and then just pass them to the base function:
foo() {
local host='dev4e-foobar.domain.com'
local profile='foo'
test_system $host $profile
}
So when you type foo at a shell prompt, you'll start an ssh connection to the test machine, as well as print handy information like:
lucas@freedom:~$ foo Client partition information HMC LPAR profile: foo Hostname: dev4e-foobar.domain.com User: user Password: password ssh user@dev4e-foobar.domain.com
This is good for bug reports where somebody needs access to your test systems, you can just copy and paste that info right away on the bug report. To summarize, you can assemble your list of test systems, and access those systems (with all the bells and whistles of shell completion) in a faster and more convenient way.
[1] I agree that there might be better ways to do test machine management, but this was a simple and surprisingly effective solution for my personal needs.
[2] That should go without saying, but keep in mind that this is approach is useful for test machines you have in your lab, not production systems. If the fingerprint of your server changed and you didn't know about it, you definitely have a problem :)
If you’re in the NYC area, IBM is hosting a great “Next Generation Linux” event at the Hilton on Church St. It should be a great day of speakers discussing where Linux is heading, what makes Linux unique and “special”, and what workloads are great for running Linux. It’s a packed session from 9-12 (breakfast at 8 if you’re an early riser).
You can register here:
https://www-950.ibm.com/events/wwe/grp/grp017.nsf/agenda?openform&seminar=692H5MES&locale=en_US/
Agenda Time Description 8:00 am Registration & Continental Breakfast 9:00 am Welcome & Introduction Linux and InnovationAndrew Binstock interviewed Donald Knuth recently, and one of the more amusing tidbits was this:
I currently use Ubuntu Linux, on a standalone laptop—it has no Internet connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics; but I trust my family jewels only to Linux.
More seriously, I found his comments about about multi-core computers to be very interesting:
I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.
Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX….
I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.
This is a very interesting issue, because it raises the question of what next-generation CPU’s need to do in order to be successful. Given that it is no longer possible to just double the clock frequency every 18 months, should CPU architects just start doubling the number of cores every 18 months instead? Or should they try to concentrate a lot more computing power into an individual core, and optimize for a fast and dense interconnect between the CPU’s? The latter is much more difficult, and the advantage of doing the first is that it’s really easy for marketing types to use some cheesy benchmark such as SPECint to help sell the chip, but then people find out that it’s not very useful in real life.
Why? Because programmers have proven that they have a huge amount of trouble writing programs that take advantage of these very large multicore computers. Ultimately, I suspect that we will need a radically different way of programming in order to take advantage of these systems, and perhaps a totally new programming language before we will be able to use them.
Professor Knuth is highly dubious that the later approach will work, and while I hope he’s wrong (since I suspect the hardware designers are starting to run out of ideas, so it’s time software engineers started doing some innovating), he’s a pretty smart guy, and he may well be right. Of course, another question is whether what would we do with all of that computing power? Whatever happened to the predictions that computers would be able to support voice or visual recognition? And of course, what about the power and cooling issues for these super-high-powered chips? All I can say is, the next couple of years is going to be interesting, as we try to sort out all of these issues.
There’s been some controversy generated over my use of the terminology of “Organic” and “Non-Organic” Open Source. Asa Dotzler noted that it wasn’t Mozilla’s original intent to “make a distinction between how Mozilla does open source and how others do open source”. Nessance complained that he didn’t like the term “Non-Organic”, because it was “raw and vague - is it alien, poison, silicon-based?” and suggested instead the term “Synthetic Open Source”, referencing a paper by Siobhán O’Mahony, ” What makes a project open source? Migrating from organic to synthetic communities”. Nessance referenced a series of questions and answers by Stephen O’ Grady from Red Monk, where he claimed the distinction between the two doesn’t matter. (Although given that Sun is a paying customer of Red Monk, Stephen admits that this might have influenced his thinking and so he might be “brainwashed” :-).
So let’s take some of these issues in reverse order. Does the distinction matter? After all, if the distinction doesn’t matter, then there’s no reason to create or define specialized terminology to describe the difference. Certainly, Brian Aker, a senior technologist from MySQL, thinks it does, as do folks like me and Amanda McPherson and Mike Dolan; but does it really? Are we just saying that because we want to take a cheap shot at Sun?
Well, to answer that, let’s go back and ask the question, “Why is Open Source a good thing in the first place?” It’s gotten to the point where people just assume that it’s a good thing, because everybody says it is. But if we go back to first principals maybe it will become much clearer why this dinction is so important.
Consider the Apache web server; it was able to completely dominate the web server market, easily besting all of its proprietary competitors, including the super-deep-pocketed Microsoft. Why? It won because a large number of volunteers were able to collaborate together to create a very fully featured product, using a “stone soup” model where each developer “scratched their own itch”. Many, if not most, of these volunteers were compensated by their employers for their work. Since their employers were not in the web server business, but instead needed a web server as means (a critical means, to be sure) to pursue their business, there was no economic reason not to let their engineers contribute their improvements back to the Apache project. Indeed, it was cheaper to let their engineers work on Apache collaboratively than it was to purchase a product that would be less suited for their needs. In other words, it was a collective “build vs. buy” decision, with the twist that because a large number of companies were involved in the collaboration, it was far, far cheaper than the traditional “build” option. This is a powerful model, and the fact that Sun originally asked Roy Felding from the Apache Foundation to assist in forming the Solaris community indicates that at least some people in Sun appreciated why this was so important.
There are other benefits of having code released under the Open Source license, such as the ability for others to see the implementation details of your operating system — but in truth, Sun had already made the Source Code for Solaris available for a nominal fee years before. And, of course, there are plenty of arguments over the exact licensing terms that should be used, such as GPLv2, GPLv3, CDDL, the CPL, MPL, etc., but sometimes those arguments can be a distraction from the central issue. While the legal issues that arise from the choice of license are important, at the end of the day, the most crucial issue is the development community. It is the strength and the diversity of the development community which is the best indicator for the health and the well-being of an Open Source project.
But what about end-users, I hear people cry? End users are important, to the extent that they provide ego-strokes to the developers, and to the extent that they provide testing and bug reports to the developers, and to the extent that they provide an economic justification to companies who employ open source developers to continue to do so. But ultimately, the effects of end-users on an open source project is only in a very indirect way.
Moreover, if you ask commercial end users what they value about Open Source, a survey by Computer Economics indicated that the number one reason why customers valued open source was “reduced dependence on software vendors”, which end users valued 2 to 1 over “lower total cost of ownership”. (Which is why Sun Salescritters who were sending around TCO analysis comparing 24×7 phone support form Red Hat with Support-by-email from Sun totally missed the point.) What’s important to commercial end users is that they be able to avoid the effects of vendor lock-in, which implies that if all of the developers are employed by one vendor, it doesn’t provide the value the end users were looking for.
This is why whether a project’s developers are dominated by employees from a single company is so important. The license under which the code is released is merely just the outward trappings of an open source project. What’s really critical is the extent to which the development costs are shared across a vast global community of developers who have many different means of support. This saves costs to the companies who are using a product being developed in such a fashion; it gives choice to customers about whether they can get their support from company A or company B; programmers who don’t like the way things are going at one company have an easier time changing jobs while still working on the same project; it’s a win-win-win scenario.
In contrast, if a project decides to release its code under an open source license, but nearly all the developers remain employed by a single company, it doesn’t really change the dynamic compared to when the project was previously under a closed-source license. It is a necessary but not sufficient step towards attracting outside contributors, and eventually migrating towards having a true open source development community. But if those further steps are not taken, the hopes that users will think that some project is “cool” because it is under an open-source license will ultimately be in vain. The “Generation Y”/Millennial Generation in particular are very sensitive indeed to Astroturfing-style marketing tactics.
Ok, so this is why the distinction matters. Given that it does, what terms shall we use? I still like “Organic” vs “Non-organic”. While it may not have been intended by the Mozilla Foundation, the description in their web page, “only a small percentage of whom are actual employees [of the Mozilla Foundation]“, is very much what I and others have been trying to describe. And while I originally used the description “Projects which have an Open Source Development Community” vs “Projects with an Open Source License but which are dominated by employees from a single company”, I think we can all agree these are very awkward. We need a better shorthand.
When Brian Aker from MySQL suggested “Organic” vs “Non-Organic” Open Source, and I think those terms work well. If some folks think that “Non-Organic” is somehow pejorative (hey, at least we didn’t say “genetically modified Open Source” :-), I suppose we could use Synthetic Open Source. I’m not really convinced that is any much more appetizing, myself, however.
So what would be better terms to use? Please give me some suggestions, and maybe we can come up with a better set of words that everyone is happy with.
For those reading this blog trough planet LTC, I'll introduce myself:
My name is Lucas Meneghel Rodrigues, and I'm the lead of the Linux on Power test team. My team does testing of the enterprise distros of our partners, to ensure high quality of the final product.
As we're software engineers, we're allways searching for ways to streamline our work, and since we love open source, we want to share as much as we can of the work we're producing.
We took the challenge of raising the bar of open source testing, helping the community to produce high quality open source test suites and test instrumentation frameworks. I'll get more into that on the next posts. For now I just wanted to see if the syndication is already working :)
Brian Aker dropped by and replied to my previous essay by making the following comment:
I believe you are hitting the nail on the “organic” vs “nonorganic” open source. I do not believe we have a model for going from one to the other. Linux and Apache both have very different models for contribution… but I don’t believe either are really optimized at this point.
Optimization to me would lead to a system of “less priests” and more inclusion.
I made an initial reply as comment, and then decided it was so long that I should promote it to a top-level post.
I assume that when Brian talks about “organic open source” what he means is what I was calling an “open source development community”. Some googling turned up the following definition from Mozilla Firefox’s organic software page: “Our most well-known product, Firefox, is created by an international movement of thousands, only a small percentage of whom are actual employees.”
This puts it in contrast with “non-organic” software, where all or nearly all of the developers are employed by one company. (And anyone who proves talented at adding features to that source base soon gets a job offer by that one company.
By that definition we can certainly see projects like Wine, Mysql, Ghostscript (at one time), and others as fitting into that model, and being quite successful. There’s nothing really wrong with the non-organic software model, although many of them have struggled to make enough money when competing with pure proprietary softare competitors, with MySQL perhaps being the exception which proves the rule.
In most of these cases, though, the project started more as an organic open source, and then transitioned into the non-organic model when there was a desire to monetize the project — and/or when the open source programmers decided that it would be nice if they could turn their avocation into a vocation, and let their hobby put food on the family table.
Solaris, of course, is doing something else quite different, though. They are trying to make the transition from a proprietary customer/supplier relationship to trying to develop an Open Source community — and what John’s candidate statement pointed out is that they weren’t really interested in creating an organic open source developer community at all, but they wanted the fruits of an open source community — with plenty of application developers, end-users, etc., all participating in that community.
We don’t have a lot of precedent for projects who try to go in this direction, but I suspect they are skipping a step when they try to go to the end step without bothering to try to make themselves open to outside developers. And by continuing to act like a corporation, they end up shooting themselves in the foot. For example, the OpenSolaris license still prohibits people from publishing benchmarks or comparisons with other operating systems. Very common in closed-source operating systems and databases, but it discourages people from even trying to make things better, both within and outside of the Open Solaris core team. Instead, they respond to posts like David Miller’s with “Have you ever kissed a girl?”. (Thanks, Simon, for that quote; I had seen it before, but not for a while, and it pretty well sums up the sheer arrogance of the Open Solaris development team.)
So while Linux may not be completely optimized in terms of “less priests” and more inclusion, at least over 1200 developers contributed to 2.6.25 during its development cycle. Compared to that, Open Solaris is positively dominated by “high priests” and with a “you may not touch the holy-of-holies” attitude; heck, they won’t even allow you to compare them to other religions without branding you a heretic and suing you for licensing violations!
IBM launched its iDataPlex server systems today - think of it as a “Google” for your datacenter. It’s targeted at web workloads and is insanely dense and power efficient compared to traditional server buildouts. And it uses Linux on commodity hardware so it’s also ridiculously cheap. If you’re a web hosting shop or you have your own web farm that could use a serious overhaul, iDataPlex is a very cool solution.
Did I mention it only runs Linux?
Ashlee Vance cracks me up - it’s clear from this article he’s been talking to vendors for two long (see the last sentence in this quote):
The system itself is quite remarkable. IBM has reworked its approach to rack servers allowing it to place twice as many systems in a single cabinet. This attack centers on delivering the most horsepower possible in a given area while also reducing power consumption. IBM hopes the iDataPlex unit will attract today’s service providers buying thousands and tens of thousands of servers and also big businesses such as oil and gas firms and media companies that will also possibly pursue a grid-ish data center computing model pioneered to some degree by Google.
But the really awe inspiring bit of iDataPlex comes from the fact that IBM is willing to go after this market at all and that it did so without screwing up the hardware design.