- Planet LTC aggregates the collective public bloggings of IBM employees in IBM's Linux Technology Center (LTC).
- This site is for and by members of the Linux community, and is hosted and supported by Brian Warner: Linux user, fan, and member of IBM's Linux strategy team.
Home
My talk for LinuxCon Brazil 2010 (KVM Security)
I’m back from LinuxCon Brazil 2010. After spending two entire days off-line (interesting experience btw), I can finally upload the slide deck for my talk, “KVM Security – Where Are We At, Where Are We Going”, as promised.
I can’t spend time reporting on the event right now, so I’ll just summarize that it was in my opinion the best Linux-related even we had down here so far, with some good talks from both local and foreigner guys.
The funniest part, however, was seeing Linus having it’s own Justin Bieber moment, with girls freaking out and everything
Thanks for everyone who attended. I hope we can all meet again next year for an even better event.
PS.: I ended-up canceling the Linux Professional Development BoF, due to confusions with scheduling and a couple of other things – Sorry for everyone who planned to attend, but keep in touch (comment here or email me at klaus@klauskiwi.com) – I still have the idea of at least mapping the Linux professional development industry here in Brazil. We need better know each other, really!
-Klaus
Scanning Package that Doesn't Suck
You've received that email, "Please print, sign, scan, and return." Possibly it said FAX - but like me, you just can't wrap you're head around using a FAX in 2010. So you fire up gimp (or xsane), scan a page, crop it, save it; repeat for N pages; then spend 10 minutes reading the absurdly obfuscated ImageMagick man pages to finally stitch the images together into a PDF, and return to sender. You do this once, and the memory of it provides a very significant mental barrier to ever repeating the process.
I had to do this again today for a pair of documents. The Simple Scan tool caught my eye and I gave it a try. It's pure genius, in an "OMFG why did it take 10 years for this to appear?" kind of way. Simple Scan handles the intermediate files behind the scenes, crops all the pages to the same size, displays a thumbnail of each page you scan, and finally saves the document as a PDF. Like, wow. Thanks Simple Scan!
Red Black Trees

Red Black trees are a critical data-structure in the Linux kernel. I've often wondered what made them unique to other trees, but ignored the impulse to dive into it much beyond reading the excellent Wikipedia article on red black trees.
An rbtree achieves O(log n) time complexity for search, insert, and remove. The key properties of an rbtree are as follows:
- A node is either red or black.
- The root is black. (This rule is used in some definitions and not others. Since the root can always be changed from red to black but not necessarily vice-versa this rule has little effect on analysis.)
- All leaves are black.
- Both children of every red node are black.
- Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.
I finally manned-up and decided to write a sample red black tree in python. Despite having covered binary trees ad nauseam in college, I was surprised how challenging it was to write a completely functional red black tree. After a few nights of "free time" dedicated to the project, I finally have something to show for it.
$ ./rbtree.py ***** Populating Red Black Tree with 1000 Nodes ***** ***** Test Insert Complexity O(log N) ***** PASS ***** Test In-Order Traversal ***** PASS ***** Test Search Complexity O(log N)***** PASS ***** Test Remove Functionality ***** PASS
The source comes with a built-in self test that inserts the values 1-1000 in random order, locates them all, then removes them in random order. It verifies the 5 properties of the rbtree at each step, and prints the results.
While I am glad to have done it, I am truly embarrassed at how long it took me to complete. On the bright side, the principles I had to dust off to get this done are now painfully fresh in my head. If you'd like to see the source, it's available here: rbtree.py
Lastly, there is a clever interactive demo here (requires java):
Red Black Tree Demonstration
Next up... python metaclasses, and why Guido is an evil bastard.
New opencryptoki release available
I just now found the time to write about the latest opencryptoki version, which was released just over two weeks ago.
Opencryptoki version 2.3.2 was released roughly 6 months after 2.3.1, and brings a series or improvements and bug fixes:
- Improved performance when handling many sessions or many session objects. An inefficient walk through a linked-list was part of the validation step for every operation involving session or object handles. While still lacking a more efficient data-structure, we where able to use the pointers themselves as handles, thus making the look-up in linear time as opposed to exponential time as it were. This improvement has significant impact for scenarios where a single process had more than 4000 sessions at once. Although we are still able to do some verification, this change may also expose buggy applications which may crash if trying to use invalid handles, so be advised.
- Largely rewritten build scripts. This version went through a much needed refactor for the autoconf/automake build scripts, in the hope of having now a clearer and less error-prone build procedure.
- New SPEC file for building RPM packages. The Opencryptoki binaries are now split into different sub-packages: the main opencryptoki package now brings only the slot daemon (pkcsslotd, initialization script) and administration utilities (pkcsconf, pkcs11_setup). The opencryptoki-libs package brings the PKCS#11 library itself. The packages opencryptoki-swtok, opencryptoki-tpmtok, opencryptoki-icatok and opencryptoki-ccatok bring token-specific plug-ins (aka STDLLs) that enables support for different kinds of crypto hardware. This way, the System Administrator can now choose to install only what’s necessary for his/her environment.
- A nice addition by Kent Yoder that allows pkcsconf to display mechanisms names instead of only numeric identifiers
- Kent also provided a couple of fixes to the software token (inaccuracies in mechanism list) and testcases
- A couple of useful additions/fixes related to init-scripts and pkcsconf by Dan Horák
- A number of RSA fixes and improvements by Ramon de Carvalho Valle, including an endianess bug in key-pair generation for the software token and improved PKCS#1 v1.5 padding functions.
As for the next version, we’re having a strong focus on making the testsuite better. You can follow the development log here.
-Klaus
Bare Metal Versus Hosted Hypervisor Security
by George Wilson, IBM Linux Technology Center
I was recently reading through the NIST “Draft Guide to Security for Full Virtualization Technologies” (SP 800-125 draft) [http://csrc.nist.gov/publications/drafts/800-125/Draft-SP800-125.pdf]. It discusses various considerations relating to hypervisor security. One section that particularly struck me was the comparison of bare metal vs hosted hypervisors. These are also known as Type I and Type II hypervisors, respectively. The document states that choosing between them is a critical security decision. That started me wondering if it is actually true that Type I hypervisors offer superior security to Type II hypervisors. While a Type I hypervisor may have a small kernel, it relies on and trusts an entire OS instance in the resource-owning partition (Dom0 in Xen parlance) for device access. So while it might at first blush appear that a Type I hypervisor has a much smaller TCB than a Type II, the TCB is really just in a different place. Given imperfect knowledge of the implementations and similar size, complexity, and maturity, it would seem that Type I and Type II hypervisors would in general offer similar security. I can’t find any solid evidence to the contrary. I’d love to hear from someone who can clarify why the Type I vs Type II distinction is in any way a major factor in hypervisor security analysis.
"The Trouble with Multicore" by David Patterson in July 2010 IEEE Spectrum
However, I was especially happy to see the following sentence:
So rather than working on general programming languages or computer designs, we are instead trying to create a few important applications that can take advantage of many-core microprocessors.
Focusing on parallelization in the large is a great improvement over the traditional academic focus on parallelization in the small. All else being equal, the larger the software artifact, the larger the units of work, and the smaller the fraction of computational resources spent on communication. The less the communication, the better the performance, and usually the greater the scalability. So Patterson's pronouncement is a welcome change, especially given his group's earlier focus on small-scale computational kernels. I hope that the fact that Patterson has now joined the growing group of academics focused on parallelization in the large will encourage other academics to do the same.
Of course, I could raise a number of quibbles with the paper:
- The analogy of parallel processing with journalism (last full paragraph of the last column on page 30) misses the mark. Patterson notwithstanding, the fact is that most writers do in fact use parallel processing: there will be a reporter, a copy-editor, and so on. It is in fact quite common for authors of large works to acknowledge those who did research, fact-checking, and other tasks. Of course, to Patterson's point, there must be a limit to the degree of parallelism that can be achieved. But the success of things like Wikipedia indicates that the potential for parallelism is much larger than has been commonly thought.
- Patterson argues that desktop applications rarely have sufficient intellectual horsepower behind them to make good use of multicore systems (last sentence of page 31). History has shown, however, that it is not raw intellectual horsepower that is required, but rather experience and proper training.
- Patterson also seems to believe that parallel programmers should start small and work their way up to larger systems (last sentence of first paragraph of page 32). Sequent's experience indicates otherwise: by starting off with 30-CPU systems from the get-go, Sequent avoided the typical parallel-programming experience, which is to rewrite the program from scratch multiple times, first to accommodate parallelism at all, next to scale beyond two CPUs, next to get beyond the 16-32-CPU level, and so on. Diving into the deep end of the parallel-programming pool can be quite a bit cheaper and easier than gingerly paddling out from the shallows.
- Patterson complains that large systems (128 cores) are not being manufactured, and that software emulation is painfully slow (middle of third column on page 32). Such large systems have in fact been available for quite some time from a number of manufacturers. Of course, they are still quite expensive, which can certainly render them unavailable to most developers. However, there is little need for universities to fabricate them, unless of course they are conducting research on the hardware itself.
Finally, the box on page 31 entitled “Easy as Pi” deserves special attention. In this box, Patterson contrasts a sequential method for calculating the quantity π/4, namely summing the infinite series for the arctangent of one radian, with a parallel Monte Carlo method, which generates pairs of random floating-point numbers between -1 and +1, then counts the fraction that lie within the unit circle.
How good are these algorithms?
FISL11 lectures presented, time to enjoy the party
Apresentação FISL 11: Segurança em Virtualização utilizando o KVM
Abaixo está o link para o PDF da minha apresentação utilizada no FISL 11 sobre “Segurança em Virtualização utilizando o KVM”.
Lembrando que eu devo abordar novamente este tópico na LinuxCon Brasil 2010, que acontecerá dia 31 de Agosto e 1° de Setembro deste ano – fique ligado na programação. Aproveito também para adiantar que eu devo conduzir um “Encontro de desenvolvedores profissionais de Linux” na mesma LinuxCon Brasil 2010. Deverá ser uma oportunidade para encontrar colegas das várias empresas que trabalham direamente com desenvolvimento do Sistema Operacional Linux, e debater sobre o mercado de trabalho, educação, e realizações. Entre em contato (klaus arroba klauskiwi.com) ou deixe um comentário se estiver interessado neste mini-summit.
Comentários, correções e dúvidas são sempre bem-vindas!
-Klaus
From the Why-didn't-I-do-this-before department (vim+cscope)
I finally got tired of lack specifiers in "git grep" and the cscope ncurses interface. I spent a few minutes and setup the vim cscope plugin using this mighty fine tutorial:
http://cscope.sourceforge.net/cscope_vim_tutorial.html
The one gotcha I ran into was having to disable my vim setting that automatically changed the working directory to that of the open file - it broke the cscope plugin relative filenames.
" automatically switch the cwd to that of the file in the buffer " This breaks cscope plugin " autocmd BufEnter * :cd %:p:h
Very, VERY VEERRRYYY nice. Now if only I could get a full call graph out of it...
Stupid RCU Tricks: Holding Off RCU Read-Side Critical Sections
Now, an RCU callback, being a C-language function, has a definite beginning and end. But what about synchronize_rcu(), which blocks until an RCU read-side critical section has elapsed? How does RCU know how long to hold off new RCU read-side critical sections once synchronize_rcu() returns?
New Blueprint available: Securing KVM guests and the host system
IBM recently made available another Blueprint of my authorship: Securing KVM guests and the host system.
The text, which also has a PDF version, brings a couple of steps and some discussion around the theme of KVM Security for the Red Hat Enterprise Linux running on IBM System x with Virtualization capability. Those include remote management aspects, host and guest security, a few suggestions for auditing and why not some image-at-rest cryptography?
The complete index follows:
- Introduction
- Securing KVM guests and the host system
- Secured KVM remote management
- Setting up secure remote management
- Remote management using SSH tunnels
- Remote management using SASL authentication and encryption
- Remote management using TLS
- Guest virtual network isolation options
- Network port sharing with Ethernet bridges
- Network port sharing using 802.1q VLANs
- Auditing the KVM virtualization host and guests
- Audit rules file
- KVM guest image encryption
- Using encryption in KVM guest images
- Migrating existing guests to encrypted storage
- Installing a new KVM guest
- Storing encrypted guest images
- Appendix A. Sample audit rules file
- Appendix B. Troubleshooting
Feedback, comments, corrections and suggestions are welcome as always, and we now have a way to provide them directly in the text. Questions can be answered in the developerWorks Linux Security Community Forum.
Reviewing patches
I always struggled at reviewing code.
Specially when the code to be reviewed is in reality a patch inlined in some e-mail… I hate monospaced fonts in my e-mail reader, and with all the context switches I got in my daily work, I simply can’t concentrate properly in order to follow what’s been proposed with that one patch out of many, in that long long patch series.
In the past, I used to apply them manually, then go over the code using Source Navigator and later cscope.
I still miss the ability to jump between symbol definition and use that cscope does the best, but I have a much more streamlined way of reviewing patches today, thanks to git, meld, and claws-mail.
The first thing is about git. Nowadays I use git in every coding project I use – even if the upstream project is not using git as SCM itself (I simply create a local repository and import). And this is not only for making reviewing patches easier, but all sorts of things, like fast branching and merging, easy cherry-picking, rebasing, commit amending, modern utilities et al. It’s really the 21st century version control system.
The second thing is meld. Meld is one good example of an intuitive interface that doesn’t get in the way. It can compare, merge and edit files (up to 3-way merge if needed). Supports all the major SCMs such as git, hg, cvs and svn (although I can’t find a reason why would anyone still use the last two, at least locally).
The forth thing, and where actually everything makes sense, is Claws-mail, which has the very useful (and unique?) ability to create custom actions to process messages.
Guess what happens when you combine Claws-Mail’s actions with a script that uses git and Meld? A very point-and-click way of reviewing patches:
The trick is in configuring an action in Claws-Mail that opens a terminal and calls a script. The script uses git-am to apply the patch contained within the selected mail message to some branch in your local git repository. After applying, it calls git-difftool to show the differences. git-difftool then calls any diff tool you might like (my suggestion stays with Meld).
I’m attaching the script for reference below:
#!/bin/sh ## git-review-step ## (C) Copyright 2010 Klaus Heinrich Kiwi ## Licensed under CreativeCommons Attribution-ShareAlike 3.0 Unsupported ## http://creativecommons.org/licenses/by-sa/3.0/ for more info. ## dirname is where the git tree is located. dirname=$HOME/sandbox/ock/sourceforge-git/opencryptoki if [ "$#" -lt 1 ]; then echo "Invalid number of parameters" echo "usage: $(basename $0) <patch1> [patch2] [patch3] [...]" exit 1 fi messages=($@) cd $dirname oldbranch=`git branch | grep -e '^* ' | cut -d " " -f 2` # Save any uncommitted changes in the working dir or index if git stash | grep HEAD; then savedchanges="yes" fi function restore() { echo "Reverting to original branch..." git checkout --force $oldbranch if [ -n "$savedchanges" ]; then echo "Restoring un-committed changes..." git stash pop fi } # Get branch to apply to git branch echo "Select branch to apply patches:" echo " Enter \"<branchname>\" to apply to an existing branch" echo " Enter \"<newname> [origref]\" to create a new branch from \"origref\"" echo " reference (use current branch and HEAD if left blank)" read -p "Apply patch(es) to branch (default is current):" -e -i $oldbranch newbranch origbranch if [ -n "$newbranch" ]; then if git branch | grep -e "\b${newbranch}$"; then echo "Applying to existing branch \"$newbranch\"" # Checkout if ! git checkout $newbranch; then echo "Error checkout out \"$newbranch\" - Aborting" restore read -p "Press Enter to continue" exit 1 fi else if [ -n "$origbranch" ]; then echo "Applying to new branch \"$newbranch\" created from \"$origbranch\" branch..." else echo "Applying to new branch \"$newbranch\" created from \"$oldbranch\" branch..." fi if ! git checkout -b $newbranch $origbranch; then echo "Error creating \"$newbranch\" from \"$oldbranch\" - Aborting" restore read -p "Press Enter to continue" exit 1 fi # if ! git checkout ... fi # if `git branch | grep ... fi # if [ -n $newbranch ... # Apply patches to working dir using git-apply amparams="--whitespace=error-all" while ! git am $amparams ${messages[@]}; do git am --abort echo "git-am failed. Retry (the whole chunk) with additional parameters?" read -p "git-am parameters (empty aborts):" -e -i $amparams amparams if [ -z "$amparams" ]; then echo "Aborting..." restore read -p "Press Enter to continue" exit 1 fi done for (( i=${#messages[@]}; i > 0; i-- )); do PAGER='' git log --stat HEAD~${i}..HEAD~$((i-1)) if git diff --check HEAD~${i}..HEAD~$((i-1)); then echo "WARNING: Commit introduces whitespace or indenting errors" fi git difftool HEAD~${i}..HEAD~$((i-1)) done echo "Restoring working tree to original state" restore read -p "Press Enter to continue"

git-review-step by Klaus Heinrich Kiwi is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at blog.klauskiwi.com.
Stupid SMP Tricks: Memory Barriers (RWC with rmb)
int x, y; /* shared variables */
int r1, r2, r3; /* private variables */
void foo_0(void)
{
ACCESS_ONCE(x) = 1;
}
void foo_1(void)
{
r1 = x;
smp_rmb(); /* The only change. */
r2 = y;
}
void foo_2(void)
{
y = 1;
smp_mb();
r3 = x;
}
After these three functions complete, we have an assertion. Please note that by “complete” I mean that all effects of the functions have become globally visible. One way to ensure this level of completion is for the thread that spawned foo_0(), foo_1(), and foo_2() to do pthread_join() on each of them in turn, and only then execute the following assertion:
assert(!(r1 == 1 && r2 == 0 && r3 == 0));
Can this assertion ever trigger?
Stupid SMP Tricks: Memory Barriers (RWC)
Consider the following code fragment, where each function foo_n() runs on CPU n, all concurrently:
int x, y; /* shared variables */
int r1, r2, r3; /* semi-private variables */
void foo_0(void)
{
ACCESS_ONCE(x) = 1;
}
void foo_1(void)
{
r1 = ACCESS_ONCE(x);
smp_mb();
r2 = ACCESS_ONCE(y);
}
void foo_2(void)
{
ACCESS_ONCE(y) = 1;
smp_mb();
r3 = ACCESS_ONCE(x);
}
Now suppose that the following assertion runs after all of the preceding functions complete.
assert(!(r1 == 1 && r2 == 0 && r3 == 0));
Can this assertion ever trigger? Why or why not?
Libvirt-0.8.2 is out. IBM pHyp driver now supports IVM and storage management

Backup FS: Getting Started
I use rdiff-backup to keep a few months worth of daily backups for my home systems (and those of my parents for that matter). The ability to recover any version of a file is great - although the process still requires a geek (me).
$ rdiff-backup --restore-as-of "7D" user@backupserver::/path/to/backup/file
Wouldn't it be great if you could just mount the backup repository and browse by path or date and then just copy the desired version? Enter BackupFS, a fuse filesystem implemented with rdiff-backup.
I've only just started digging into this, and rdiff-backup's python packages were not intended to be used as libraries (not with all the code buried in rdiff_backup.Main and all the global module variables floating around. Still, I was able to get a server test and a listing of the repositories root increments by using the python modules (and not just making multiple subprocess() calls).
I have a glorified version of the example hello world fuse filesystem able to mount and list a few meta-directories:
dvhart@vin:backupfs.git$ ./backupfs.py mnt && (tree mnt; fusermount -u mnt) Testing server started by: ssh -C katara rdiff-backup --server Server OK mnt |-- By Date | `-- increments.2010-01-07T22:11:42-08:00.dir |-- By Path `-- hello 3 directories, 1 file Fatal Error: Lost connection to the remote system
I still have some basic research to do in order to understand how to operate within the rdiff-backup packages (API isn't quite the right term ;-). After that, it'll be on to a more formal design and then some nicer code.
Stupid SMP Tricks: Lockless Access to Structure
struct foo {
int a;
int b;
};
struct foo static_foo = { 42, 17 };
struct foo *foo_p = &static_foo;
Because this is compile-time initialized, readers should not need use rcu_dereference().
Let's further suppose that at runtime, some CPU, task, or thread might set foo_p to NULL. Because we cannot free the compile-time-allocated static_foo, there is no need for RCU grace periods and RCU read-side critical sections.
These restrictions do simplify things. Readers might be as simple as:
p = foo_p;
if (p != NULL)
do_something_with(p->a, p->b);
Does this work? Why or why not?
IBM and the Jeopardy Challenge
This is just one of numerous fun and still seriously challenging projects being worked these days. The IBM Research teams are amazing. They have found some very interesting performance challenges. The project drives advanced technologies, product improvements, system improvements, performance tool improvements, but most of the work is in the realm of demonstrating complex natural language processing in a time-constrained answer and question world.
Stupid RCU Tricks: Synchronizing With External State
Suppose you have an array of three RCU-protected structures. At any given time, one of them is the current structure that will be used by RCU readers. This means that there is a global pointer that will be pointing to one of the elements of this array, thus designating it as the current element. (And yes, a grace period must elapse before a given element is reused.)
But suppose that there is another global integer whose value must be kept consistent with the contents of the current element — to keep things trivial, let's assume that the value of this global integer must be twice that of an integer within the current structure.
The data structures might be set up as follows:
struct rcu_protected {
int a;
};
struct rcu_protected elements[3];
struct rcu_protected *current = &elements[0];
int consistent; /* must be 2 * current->a */
How can this be accomplished?
It just worked?
I had Dad ship his old laptop for an upgrade. After installing Ubuntu Lucid (from a USB key) and upgrading the RAM, I was _really_ impressed that Empathy video chat over Google Talk "just worked". I went and picked up a Logitech C120 web-cam for him for $20, took it home, plugged it in the USB slot and guess what - it just worked! We've come a long way in 10 years Linux! Honestly it felt weird... I kept thinking... isn't there a driver I should have to download, build, fix, patch, build, etc... It's like... having a mac or something. The only thing that would have made it better would be if the box had a penguin logo on the back.


