Professional Documents
Culture Documents
Abstract
Free and open source software (FOSS) is considered by many, along with Wikipedia, the proof of an ongoing
paradigm shift from hierarchically-managed and market-driven production of knowledge to heterarchical, col-
laborative and commons-based production styles. In such perspective, it has become common place to refer to
FOSS as a manifestation of collective intelligence where deliverables and artefacts emerge by virtue of mere
cooperation, with no need for supervising leadership. We show that this assumption is based on limited under-
standing of the software development process, and may lead to wrong conclusions as to the potential of peer
production. The development of a less than trivial piece of software, irrespective of whether it be FOSS or pro-
prietary, is a complex cooperative effort requiring the participation of many (often thousands of) individuals. A
subset of the participants always play the role of leading system and subsystem designers, determining architec-
ture and functionality; the rest of the people work “underneath” them in a logical, functional sense. While new
and powerful forces, including FOSS, are clearly at work in the post-industrial, networked economy, the cur-
rently ingenuous stage of research in the field of collective intelligence and networked cooperation must give
way to a deeper level of consciousness, which requires an understanding of the software development process.
Key words: Open source – FOSS – software process – collective intelligence – Wikipedia
1
Paolo Magrassi 2010 - Creative Commons Attribution-Non-Commercial-Share Alike 3.0
and hardly exhaustive, review of scientific papers pub- dards. For example, on average each of the top 25 banks
lished on the subject, reference to the above books, all in the world mobilizes a development staff of at least
best sellers revered in many intellectual circles, univer- that size every year, developing roughly 2 million lines
sities, research centres and symposia, will suffice to of code (LOC’s) of software (Gartner 2009).
provide an account of the ongoing efforts aimed at de- The second thing that the Kroah-Hartman (2008)
fining the bottom-up forces that are shaping the post- statistic tells us is that the spontaneous, entirely
industrial knowledge economy. autonomous participants in Linux Kernel development
One thing that those publications have in com- are a minority: six in every seven developers work as
mon, regardless of their different flavours and ap- employees or contractors of ordinary businesses.
proaches to describing or advocating a world of sharing These numbers will not surprise senior software
and peer-production, is the assumption that free and people: every experienced person who has performed
open source software (FOSS) emerges as the product of software development or project management in less
a collective intelligence phenomenon with no need for than trivial projects thinks that a system of 11+ million
top-down coordination and oversight. In Benkler’s LOC’s like, e.g., Linux Kernel in its 2.6.30 version
words, for example: (Christianson 2008, Bos 2007, Leemhuis 2009), can not
be built by thousands of developers “without relying on
“Free software offers a glimpse at a more basic managerial commands”.
and radical challenge. It suggests that the networked In fact, as we saw, 86% of Linux Kernel devel-
environment makes possible a new modality of orga- opers, the core of FOSS, work for a pay in the ranks of
nizing production: radically decentralized, collabora- a company, with all the usual managerial controls. But
tive, and non-proprietary; based on sharing resources this is not even the point. If a team of entirely independ-
and outputs among widely distributed, loosely con- ent and autonomous developers, working for free in
nected individuals who cooperate with each other their spare time, wanted to cooperate to the building of
without relying on either market signals or managerial a software system, whether FOSS or proprietary, they
commands.” (Benkler 2006, p. 60) would still have to submit to the requirements of design
“Based on our usual assumptions about volunteer and coordination, as we will see in sections 3 and 4.
projects and decentralized production processes that
have no managers, this was a model that could not 3 The software development process
succeed. But it did” (Benkler 2006, p. 66).
Software is a labour intensive activity. Statistics
We contend that this interpretation is flawed be- vary greatly, but according to accurate reviews (Ma-
cause spontaneous participation does not remove the grassi 1996) each “function point” of software needs
need for leadership in software development. We will between 0.5 and 2 person-days of work across the full
show that large software products, whether FOSS or first-cycle of a product (from conception to first user
proprietary, are all distributed and cooperative in nature, acceptance test): this implies that a software written in
and do require top-down controls. C, like Linux, requires in the neighbourhood of 1 per-
son-day to get 15 working LOC’s done (the number of
2. Free and open-source software de- LOC’s per function point depends, among other things,
on the programming language under consideration).
velopment: some metrics A large software system, such as an operating
Useful sources concerning both FOSS develop- system like Linux or a full-scale business application
ment metrics and the FOSS methodology and process like a bank’s information system (in order of magni-
model can be found within the FOSS community itself. tude, about the size of the Linux Kernel today) amounts
Concerning development metrics, for example, to millions of LOC’s: therefore, in order to be delivered
Kroah-Hartman (2008) provides us with relatively re- within a reasonable time frame, it can only be built by
cent data. According to this source, in the 2005-2007 many hundreds, when not thousands of professionals.
time frame the Linux Kernel was attended to by ap-
proximately 3,700 individuals (not all simultaneously), 3.1 Integration vs. modularity
86% of which were employed or contracted by enter- This requires coordination, otherwise the system
prises and 14% were moonlighters working for free. as a whole turns out incoherent. Worse yet: unlike a
This statistic tells us two things. The first is that system such as Wikipedia, which can have low coher-
the total of Linux Kernel developers amounted on aver- ence but still perform decently, a software must be inte-
age to about 1,200 people on any given year between grated. In Wikipedia, an article may be very good even
2005 and 2007. It should be noted that 1,200 developers if some of the links departing from it point to badly
per year, some of which (as a minimum, presumably the written or inaccurate articles. Over time, those articles
14% freelancers) working only part-time, is not a par- will presumably be strengthened and the overall “sys-
ticularly large number by software development stan- tem” (the collection of all interconnected articles) will
2
Paolo Magrassi 2010 - Creative Commons Attribution-Non-Commercial-Share Alike 3.0
be better: in the meantime though, it will have worked goal that conflicts with the fact that many programs are
correctly in at least some of its parts. But a collection of providers and consumers at the same time. It follows
software programs can stop working if any of the pro- that modularity is difficult to achieve without top-down
grams is bad. coordination, especially when the needed service be-
One can consult a correct and informative ency- longs in a subsystem very far from the one that the pro-
clopaedia entry on “William Shakespeare” even if the grammer is working on.
related (and linked to) entries on “Stratford-upon- Secondly, service-based architectures (like any
Avon”, “Christopher Marlowe” and “Titus Andronicus” architecture, for that matter) require stiff coordination in
are missing or incorrect. However, one cannot run an building directories and keeping a coherent nomencla-
order entry program if the related inventory- and cus- ture for all implied objects, such as programs and data
tomer-management programs are not working or in- sets. This is definitely a goal requiring top-down super-
existent. In software terminology, this is expressed by vision. It does not matter whether such supervision is
saying that Wikipedia is a loosely-coupled system, carried out by individuals or committees, since in either
while software is tightly-coupled. case people ought to be named and assigned to the co-
Maximizing integration (which requires cou- ordination task: entities at a higher level than the indi-
pling) and modularity (which requires mutual independ- vidual programmer need to exist and exert their powers
ence) at the same time has been the holy grail of soft- if the software is to behave coherently.
ware development since the 1960’s (Böhm 1966,
Dijkstra 1968, Constantine 1979, De Marco 1979). In- 3.3 Feedback loops
tegration favours coherence, consistency and perform- Furthermore, while it can be relatively easy to
ance, while modularity eases maintainability, increases state in plain English the general, global purpose of a
robustness and resilience, and allows for smoother divi- software (e.g.: “A new production management system
sion of work. using RFID tags for components tracking and assem-
The two goals, however, are conflicting, and bly”, or “Adding iPhone support to Linux”), precision
only suboptimal solutions can be aimed at. This is a fact becomes paramount as the development project pro-
that escapes the attention of most non-software authors ceeds, because most computers notoriously need de-
in the peer-production and networked economy litera- tailed instructions to perform even elementary tasks.
ture, including those –such as (Baldwin 2005)– who The Linux Kernel, for example, is written mainly in C,
recognize the importance of modularity in software de- a programming language much closer to hardware as-
velopment (although they seem to think of it as an ex- sembly than to human language. The C instructions for
clusive prerogative of FOSS). printing “hello, world” on a computer screen look as
Modern software architectures, conceived spe- follows
cifically for highly-distributed and web-based systems, #include <stdio.h>
try to confront the harsh reality of software modules int main(void)
interdependence in various ways. Service-based archi- {
tectures, for example, strive to make systems as loosely- printf("hello, world\n");
coupled as possible, in order to limit the negative effects return 0;
of missing modules and corrupted links. Direct refer- }
ences made by programs to one another are reduced by
maintaining directories of “services” (programs). When from which the profane reader gets a grasp of
a program needs a service it will issue a request by sim- how complicated it can get to instruct a computer to do
ply naming the service; the caller program (“con- such exoteric things as inventory management or shop-
sumer”) needs not be linked in the same computer floor components assembly. At these levels of preci-
memory as the service’s (“provider”), and the service sion, required in the Linux Kernel like in any other
may reside anywhere in the internet. software, little can be left to improvisation. It is chal-
lenging to cooperate on the creation of even a single
3.2 Needs for top-down supervision program, amounting to a few hundreds lines of code: a
This approach, however, still leaves two issues comma or a bracket omitted or removed by programmer
open. “B” may make the program obscure to programmer “A”
To begin with, it only removes the lighter mu- and generate a complete misunderstanding on the side
tual-dependency problems: it does unbundle software of the computer (compiler).
and hardware, and it does encourage modularity; how- Implementation details are extremely important
ever, when a programmer sets out to write a [consumer] in computer programming (“coding”), and sometimes
program, they will still need to know what provider they make it impossible to comply with a given design
programs do, and which parameters they must be specification, creating feedback loops that reflect back-
passed, in order to make any use of their services. This wards from subsequent to antecedent implementation
requires stability of provider programs’ specifications: a stages of the project: design decisions (including the
3
Paolo Magrassi 2010 - Creative Commons Attribution-Non-Commercial-Share Alike 3.0
naming conventions we alluded to above) must be (person, team, committee, but in any case a defined
modified due to issues brought up at coding time and subset of the entire development team) takes care of the
unimagined before. overall design and integration. Tales of Lego-like soft-
This fact, referred to in software engineering by ware componentry assembly, while attractive and sug-
saying that the development process has the shape of a gestive, belong in the realm of software-tool vendors
spiral (Boehm 1986), clashes with the wish of assigning marketing and are inexistent in the software engineering
to each participant developer a clear and defined task literature. In fact, the systems software domain, i.e. that
once for all and then simply waiting for his deliverable. of Linux Kernel, is a very fortunate situation in that
Furthermore, and more importantly, it makes it particu- sense: because of the relative requirements stability,
larly challenging to coordinate mutually-invoking pro- some decent degree of modularity can be achieved. But
grams. in business applications software, for example, modu-
larization is still little more than a dream.
3.4 The need for top-down design
Consider programmer John writing program P1 4 Organizational models in software
and programmer Mary writing P2. If they are coding, it
means that some prior decision has been made that there
development, FOSS or not
will be a program called P1 performing certain func- While explicit FOSS design supervision goes
tions, and a program called P2 performing other func- overlooked in the software-naïve literature, where co-
tions: this is called system design, or sub-system design herence and design are considered as properties emerg-
in case P1 and P2 participate in a larger piece of soft- ing out of a “complex system” of individuals, it is of
ware. System design decisions, in this case, might have course a very well known fact in the FOSS community.
been taken by John and Mary cooperatively, “with no
managerial commands”. But what about the design of a Al Viro 1.9%
system like Linux Kernel, counting programs by the David S. Miller 1.8%
thousands and dozens of different subsystems each re- Adrian Bunk 1.7%
quiring its own design, modules split and coordination Ralf Baechle 1.6%
with other subsystems? Andrew Morton 1.5%
The structure of a large and complicated system Andy Kleen 1,2%
can, with some simplification, be depicted as an up- Takashi Iwai 1.2%
turned tree, from a root module (e.g., “Linux”) all the Tejun Heo 1.1%
way to leaves corresponding to elementary mod-
Russel King 1.1%
ules/programs needing no further split. Pjk, with 1≤ j ≥
Steven Hemminger 1.1%
M and 1≤ k ≥ N, is the generic program module being
attended to by programmer Ai, with 1≤ i ≥ R. R is the
total number of programmers available and MxN is the Table 1: The top ten Linux Kernel developers in
dimension of the hierarchical graph representing the 2005-2007. Source Kroah-Hartman (2008)
system structure. M is the breadth and N the depth: j is
the number of peer modules to Pjk (all needing to be Kroah-Hartman (2008) again provides us with
coordinated, i.e., co-designed, along with it), while k some information: relatively few individuals determine
measures its degree of seniority; the lower is k, the lar- and even produce directly a substantial amount of the
ger the number of modules underneath, because we are work. In 2005-2007, for example, the ten persons listed
moving towards the root. It is easy to see that while a in Table 1 produced 14% of everything that was done in
leaf module PjN may only need 2 people to be designed, Linux Kernel. The top 30 individuals produced 30%.
a very high-level module may require hundreds of co- These are the people who, along with Torvalds, made
designers (all individuals who will program the modules the top design decisions. (The vast majority of the other
underneath), which is obviously unrealistic. 3700 people or so mainly –although not exclusively–
It should by now be clear how the need for hier- did bug fixing: an activity, according to Raymonds
archical layers of system design imposes itself: a few (1999), at which “crowds” excel).
designers (or maybe one Linus Torvalds) at the top, This does not mean that design proposals cannot
then some sub-designers underneath, then sub-sub- be made by anyone else; it does not mean that Torvalds
designers, and so on. No meaningful piece of software, et al. exert managerial controls such as assigning tasks
much less one of millions LOC’s, has ever come to- to specific people; it does not mean that participants
gether as a working computer program without some cannot often pick their preferred tasks from a to-do list;
“higher-level” intelligence controlling the system’s it does not negate that many design decisions are made
overall integration. by committee rather than by an individual alone (ex-
The modules attended to by individual pro- actly as it happens with proprietary software products):
grammers may only work together if one entity above but it does say that most Linux (or any other FOSS
4
Paolo Magrassi 2010 - Creative Commons Attribution-Non-Commercial-Share Alike 3.0
5
Paolo Magrassi 2010 - Creative Commons Attribution-Non-Commercial-Share Alike 3.0
will appear on the list of software and projects owners, Finally, concerning an issue which this paper
by a genuine sense of sharing and participation, by only touched upon quickly but is strictly connected to
homo ludens payoffs, and sometimes by the urge to the division of labour discussion, there is a need to
contrast dominant software players and “monopolies” study what, if any, are the market drivers behind FOSS
like Microsoft (a urge carefully cultivated and fomented products. Most of the literature we have referred to
by Microsoft’s adversaries, including IBM, Red Hat, seems to assume, and often explicitely states, that FOSS
Intel, Novell, Sun/Oracle, Hp and many others, who are products are built by “loosely connected individuals
the employers of that 86% developers working on who cooperate with each other without relying on […]
Linux as well as of those working on all other FOSS market signals” (Benkler 2006, page 60). The role
products, and are the “hidden” market forces behind played in FOSS by the many ordinary businesses
such products). (sometimes huge software or hardware vendors) who
The second fact that separates FOSS from pro- directly or indirectly employ most of the developers
prietary software is, of course, the novel ownership must be understood better. This will provide a sharper
models reflected by the original General Public Licence insight into the dynamics of peer production, the culture
and the many others that have been developed from it of sharing, and collective intelligence.
since its inception in 1989.
References
5.2 What is not in FOSS, and in proprietary soft-
ware either Baldwin. C. and Clark, K. (2005) “The Architecture of
The notion of a coherent and performing system Participation: Does Code Architecture Mitigate
emerging from a crowd of spontaneous contributors Free Riding in the Open Source Development
without top-down direction and supervision is un- Model?”, Management Science 52(7):1116-1127
founded: it does not correspond to the way software of Baldwin, C. and von Hippel, E. (2009) “Modeling a
any kind is designed and built. Paradigm Shift: From Producer Innovation to
User and Open Collaborative Innovation”, MIT
Sloan School of Management Working Paper #
6 Further research 4764-09, November 2009
Spontaneous/voluntary participation of contributors and Bauwens, M. (2005) “The political economy of peer
the new intellectual property scheme, i.e. the quintes- production”, Ctheory.Net 1000 Days of Theory.
sence of FOSS, are formidable drivers of change and Accessed 2010.05.08. Website:
innovation in the post-industrial economy, with conse- http://www.ctheory.net/articles.aspx?id=499
quences and implications extending far beyond the Benkler, Y. (2006) The Wealth of Networks, Yale Uni-
reach of software. Economists, sociologists, jurists, po- versity Press
litical scientists, psychologists are finding and will in- Berkman Center for Internet & Society, (2009) Harvard
creasingly find in “open content” a myriad of research University,
motivations. http://cyber.law.harvard.edu/node/5373
In the case of software, the naïve notion of systemic Böhm, C. and Jacopini, G. (1966) "Flow diagrams, Tur-
emergence should be abandoned and, to investigate ing Machines and Languages with only Two
further what makes the FOSS production model differ- Formation Rules", Communications of the ACM,
ent, research should be carried out on, among other 9(5): 366-371
things, the following: Boehm, B. (1986) “A Spiral Model of Software Devel-
opment and Enhancement”, ACM SIGSOFT
• What are the relationships and the interplay be- Software Engineering Notes, 11(4):14-24, August
tween hierarchical and functional dependencies in 1986
software development organizations? Bos, H. (2007) “If you cannot beat them, sue them!”,
• Is it easier in FOSS (than in proprietary-software Landelijk Architectuur Congres, NBC Nieu-
development) to achieve [sub]system designer wegein, 21-22 November 2007
status irrespective of one’s hierarchical position? Christianson, C. and Harguess, J. (2008) “Linux, Linus
and Netville – The birth of open source”, De-
• Do bottom-up design proposals occur more fre-
partment of Electrical and Computer Engineering,
quently in FOSS than in proprietary-software de-
The University of Texas at Austin
velopment?
Constantine, L. and Yourdon, E. (1979) Structured De-
• Are FOSS products more modular than proprietary
sign, Prentice Hall.
products, when both are considered at the same
De Marco, T. (1979) Structured Analysis and System
level of abstraction with respect to hardware archi-
Specification. Prentice Hall
tecture?
6
Paolo Magrassi 2010 - Creative Commons Attribution-Non-Commercial-Share Alike 3.0