Through August 2020, Boston University has moved to remote teaching and learning, canceled on-campus activities, and minimized lab research activities. For more information, visit our COVID-19 website.
A fork() in the road – conversation between Uli Drepper & Orran Krieger
Uli Drepper, Red Hat Distinguished Engineer
Orran Krieger, Lead/PI Mass Open Cloud (MOC) and Red Hat Collaboratory
Next month at the 17th Workshop on Hot Topics in Operating Systems in Bertinoro, Italy, there will be a session on a paper entitled “A fork() in the road” by Andrew Baumann (Microsoft Research), Jonathan Appavoo (Boston University), Orran Krieger (Boston University), and Timothy Roscoe (ETH Zurich.) This paper discusses the usage of fork() as a fundamental abstraction for operating systems, detailing issues with fork() in the modern era as well as suggesting alternative solutions.
We have made the paper available here:
A fork() in the road(PDF, 528K)
This paper has quickly generated a lot of discussion: Hacker News, Twitter, LWN, Reddit, and Lobsters all have lengthy ongoing discussion threads on the points raised by the paper.
We would like to share another discussion of the paper, this one between Red Hat DE Uli Drepper and Orran Krieger. Their discussion follows a particular format as agreed ahead of time: Uli wrote an initial response to the paper, Orran provided a rebuttal, and finally Uli was given the opportunity for a final rebuttal. Follow the conversation below!
Uli’s Initial Response
First and foremost, this response reflects exclusively my opinion and must not be understood as Red Hat’s position and especially not that of the many software developers at Red Hat who probably have a better understanding of the implementation details. I dealt with the subject matter for many years now through the work on the C library, the participation in the POSIX standard committee, and, together with Ingo Molnar, working on the extension to Linux’s clone() interface to enable the implementation of Linux’s POSIX thread library (NPTL).The paper actually deals with two different aspects of fork() which are conflated and used in support of each other. This was pointed out during the review but the limiting format of a conference paper made it difficult to address this point. The main points that are made are:
fork() was not really designed, it is more an accidental implementation detail
The implementation as required by historic practice and then later standardization require many parts of the OS to have ties to the fork() implementation
These dependencies and ties make it hard to implement an OS differently from a traditional Unix system while maintaining application compatibility at the source code level
Empirical results suggest that fork() is the predominant way to start a new program
Starting a new program through fork() can have a high latency, depending on the parent process’ resource use (mostly memory)
Despite the drawbacks, fork() is taught as good design
I do not disagree with all points:
Yes, fork() is not the most well-thought-out interface
Yes, there are major problems in some situations when a new program has to be started
Whether or not fork() is taught is a good design I cannot say, for the longest time I thought of it lacking
My main grievances are:
The mixing of arguments about fork() and fork()+exec. These are separate problems but…
… the different situations when fork() is useful without exec are, at best, underrepresented
No realistic general replacement is provided
All arguments about fork() also apply to Linux’s clone() interface (and whatever other OSes use) but that is not spelled out but has to be considered in the arguments (as I will do below)
Much of the argumentation is about the standardized semantic of the fork() interface. I would argue that almost no program depends deeply on most of the requirements. Why should a new OS development or research OS therefore be limited? Alternative interfaces with just the right semantic can be used.
The facts include:
Starting a new program through fork()+exec is horrible because all/most the requirements of the fork() semantic are wasted
The work needed for fork() scales with the number of pages in the parent process’ address space, the number of file descriptors, and some more entities which can lead to high latencies (e.g., gigantic Java processes starting a program)
There are many reasons to use fork() (or clone() etc) without exec. Modern Linux has namespaces which can be used this way, traditionally the daemon() interface etc are implemented with fork(). And…
… in general multi-process programs can and should be implemented with fork(). Sharing the address space layout leads to major design advantages (and, no, it is not an additional security problem because of the lack of ASLR because the alternative, threads, all have the same address space layout as well.
While on the topic, threads are just as well or even more a design flaw because of implementation history and cannot possibly be thought as a valid alternative to multi-process applications.
In general, the vfork() interface shows that a useful interface does not have to come with a lot of semantic attached. In fact, we removed it from the POSIX standard in the 2008 revision because standardizing it did not serve much of a purpose and nevertheless it is extremely useful today. Because a runtime for a system like Linux can rely on more than just the least common denominator of the semantic of all systems vfork() can be used outside the scopes of the specification and it is used to implement posix_spawn() on Linux in many situations. As the paper shows, this eliminates the performance problems but it does it without removing fork()-like semantic entirely.I would like to see a native implementation of posix_spawn() in the kernel. In fact, many moons ago I started the discussion but I never took it to the end since the interface was deemed ill-specified unless it can fulfill the needs of the most prominent users of fork()+exec (including shells). I did and do agree with that and perhaps this effort should be picked up again. My most fundamental disagreement is about the future of fork()-like interfaces. While the paper argues that they must replaced I think that fork() is too high-level and, just like in the origins of fork() itself, we need to aggregate around some lower-level interfaces, like clone(). I do not suggest that clone() is ideal but it served us well in implementation a whole bunch of new technologies, from a real thread implementation to containers. This is a far cry from the claim that fork()prevents innovation is OS research.Also, threads are a problem and not a solution. It was ill-advised not just from the implementation’s perspective to provide programmers direct access to thread interfaces. Those who participated in the relevant standard committees and who implemented thread libraries will testify to the pain. Threads should be regarded as an implementation detail, perhaps with system-specific semantic reflecting the underlying OSes capabilities and design decisions. Fortunately we are on the way to provide better alternatives to using concurrency with extensions to programming languages and extensions like OpenMP. If one really wants to control concurrency directly then implementing multi-process applications is much better since the multiple processes prevent a whole bunch of classes of problems, they are more resilient, and easier to debug since there are fewer points of contact. All that is needed is to share exactly the resources that need to be shared and that is possible with today’s interfaces (especially when including non-standard ones included in Linux and perhaps other OSes).If it could be shown that we could have a (set of) interfaces to construct a new process from the ground up under control of the parent code this would be indeed a nice development. We all hopefully agree that such a method of process construction can only be judged a success if it can handle all situations. This means such a process composition must be able to
Share part of the address space with the child
Provide a subset of file descriptors and similar resources from the parent to the child
Do this without high latencies
To be general, this has to be possible for processes with different privileges, be it SUID/SGID binaries or binaries with different execution contexts à la SELinux. How can I trust a program that has been constructed by the parent from pieces? At least so far the security models depend on the semantic of exec as a system call and if exec is needed, so is fork().If the purpose of the paper is to point to the inadequacies of fork() and the ill-advised teaching of fork() as a jewel of software engineering I can concur. But I cannot agree with most of the conclusions and implied solutions which I found not suitable.
Thanks Uli! You raise a good point, that the majority of the paper is fundamentally about the combination of fork+exec as a mechanism for creating new programs. It sounds like we are in strong agreement on the problems with that combination. I think we are also in strong agreement that posix_spawn should be implemented in the kernel and enhanced. You raise a number of other points that we/I are not fully in agreement with. First, you say “in general multi-process programs can and should be implemented with fork()”. This is likely something we won’t agree on. If the need for isolation is important, then it makes sense to use multiple processes rather than threads/events, however, I would argue that it would be better to create multiple processes using spawn, and then establish any sharing required (e.g., passing fds and establishing common mappings). The memory can be shared copy-on-write using de-duplication between the processes – ai-prori sharing the memory is problematic for all the reasons discussed. Forcing the programmer to make explicit where they want to share (rather than having it the default) seems much less error prone. Yes, there is an argument that the re-initialization can be substantial, but, forcing applications to have a somewhat larger pool of long-lived processes seems a modest cost to not polluting the interface of our OSes with fork(), with all the attendant problems we describe. As Jonathan just reminded me, there are other alternatives that have been developed in the research community to accelerate process startup without fork(), but … thats probably outside the scope of this discussion. You state “I do not suggest that clone() is ideal but it served us well in implementation a whole bunch of new technologies, from a real thread implementation to containers. This is a far cry from the claim that fork()prevents innovation is OS research.” Agreed, it doesn’t prevent innovation, it just makes it more difficult (we say “limits innovation”, which I agree is an overstatement). Obviously with the huge open source community of Linux, innovation is going to happen; but the more we remove impediments the better. Every time we add an abstraction to our operating systems, we need to think about what it means if an application forks/clones, or decide that the abstraction will be broken in the child. I won’t repeat the arguments of the paper about multi-threading, DPDK, application level buffering, accelerators; all not working. Having the fundamental abstraction of the OS for creating new processes having undefined semantics for a whole series of abstractions is just wrong. You say “If it could be shown that we could have a (set of) interfaces to construct a new process from the ground up under control of the parent code this would be indeed a nice development.” It has been shown, multiple times by multiple research operating systems. It has been demonstrated that mechanisms that manipulate child processes from parent enabling sharing and can have high performance,… Your main concern appears to be “How can I trust a program [with elevated privileges] that has been constructed by the parent from pieces?” I would agree with that concern. We would argue that this concern applies to fork/exec, where the child is starting with an inherited environment created by the parent (and the chain of parents that created it). My/our view (and memory of what we did in past research OSes I was involved in) suggests that the way to create a process with elevated/different privileges is to make the request to create that process via another server/process (my preference) or, if spawn is built into the kernel, to use that; ensuring that all the state of the new process is carefully validated. Overall, however, I would agree, while we tried to suggest a path to alternatives, we certainly (while barely fitting into the space) didn’t treat the topic in any depth. Our fundamental goal is to raise the problem and point to potential paths. While you may already agree that the combination of fork()/exec() is bad, that conclusion is certainly not shared by the majority of textbooks, and that was the main point of the paper.
Starting multi-process programs with posix_spawn() is makes it pretty much impossible to have identical address space layouts because of ASLR. Aside from having possible address range conflicts even for the explicitly shared regions it is useful to have fixed positions for the rest (including text segments) since it allows to use pointers much more freely. Aside, de-duplication will less likely work when the address space layout differs because usually pointers are stored throughout the data segment.Furthermore, “(e)very time we add an abstraction to our operating systems, we need to think about what it means if an application forks/clones” is not really necessary. It is perfectly fine to leave (part of) the state undefined. It might be that after a fork() certain functionality is not available anymore or that programs using certain functionality cannot fork() in the first place. That is perfectly fine, programmers can react to that. This applies to uses like DPDK as well: I doubt a user of DPDK will argue much if after fork() the stack is not available anymore.Suggesting to implement program startup using micro-kernel-like services is not really something that I can take serious. The empty list of widely used, general purpose micro-kernel OSes speaks for itself. It is not as if you could not have developed something already to sit on top of an existing OS kernel to move some of the functionality to userlevel. It simply never worked up, the implementation complexity (which you try to reduce!) and security model to the best of my knowledge stopped all such attempts in their tracks.Overall it is the goal of the OS to enable more productive and efficient development and deployment of applications. It is not the goal of the OS to make the life of OS developers and researchers more pleasant. Unless there is actually functionality which cannot be implemented while preserving the ability to use the fork() functionality there is no justification to abandon the interface.
Continuing the Conversation
Uli and Orran agreed to end things there, but what do you think? We welcome your feedback on the paper and this discussion in the comments below!