For a broader coverage related to this topic, see Open-source software movement.
Open-source software shares similarities with Free Software and is now part of the broader term Free and open-source software
Open-source software (OSS) is computer software with its source code made available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. Open-source software may be developed in a collaborative public manner. According to scientists who studied it, open-source software is a prominent example of open collaboration. The term is often written without a hyphen as "open source software".
Open-source software development, or collaborative development from multiple independent sources, generates an increasingly more diverse scope of design perspective than any one company is capable of developing and sustaining long term. A 2008 report by the Standish Group states that adoption of open-source software models has resulted in savings of about $60 billion (£48 billion) per year to consumers.
Further information: History of free and open source software
End of 1990s: Foundation of the Open Source Initiative
In the early days of computing, programmers and developers shared software in order to learn from each other and evolve the field of computing. Eventually the open source notion moved to the way side of commercialization of software in the years 1970-1980. However, academics still often developed software collaboratively, for example Donald Knuth in 1979 with the TeX typesetting system or Richard Stallman in 1983 with the GNU operating system. In 1997, Eric Raymond published The Cathedral and the Bazaar, a reflective analysis of the hacker community and free software principles. The paper received significant attention in early 1998, and was one factor in motivating Netscape Communications Corporation to release their popular Netscape Communicator Internet suite as free software. This source code subsequently became the basis behind SeaMonkey, Mozilla Firefox, Thunderbird and KompoZer.
Netscape's act prompted Raymond and others to look into how to bring the Free Software Foundation's free software ideas and perceived benefits to the commercial software industry. They concluded that FSF's social activism was not appealing to companies like Netscape, and looked for a way to rebrand the free software movement to emphasize the business potential of sharing and collaborating on software source code. The new term they chose was "open source", which was soon adopted by Bruce Perens, publisher Tim O'Reilly, Linus Torvalds, and others. The Open Source Initiative was founded in February 1998 to encourage use of the new term and evangelize open-source principles.
While the Open Source Initiative sought to encourage the use of the new term and evangelize the principles it adhered to, commercial software vendors found themselves increasingly threatened by the concept of freely distributed software and universal access to an application's source code. A Microsoft executive publicly stated in 2001 that "open source is an intellectual property destroyer. I can't imagine something that could be worse than this for the software business and the intellectual-property business." However, while Free and open-source software has historically played a role outside of the mainstream of private software development, companies as large as Microsoft have begun to develop official open-source presences on the Internet. IBM, Oracle, Google and State Farm are just a few of the companies with a serious public stake in today's competitive open-source market. There has been a significant shift in the corporate philosophy concerning the development of FOSS.
The free software movement was launched in 1983. In 1998, a group of individuals advocated that the term free software should be replaced by open-source software (OSS) as an expression which is less ambiguous and more comfortable for the corporate world. Software developers may want to publish their software with an open-source license, so that anybody may also develop the same software or understand its internal functioning. With open-source software, generally anyone is allowed to create modifications of it, port it to new operating systems and instruction set architectures, share it with others or, in some cases, market it. Scholars Casson and Ryan have pointed out several policy-based reasons for adoption of open source – in particular, the heightened value proposition from open source (when compared to most proprietary formats) in the following categories:
- Localization—particularly in the context of local governments (who make software decisions). Casson and Ryan argue that "governments have an inherent responsibility and fiduciary duty to taxpayers" which includes the careful analysis of these factors when deciding to purchase proprietary software or implement an open-source option.
The open source label came out of a strategy session held on April 7, 1998 in Palo Alto in reaction to Netscape's January 1998 announcement of a source code release for Navigator (as Mozilla). A group of individuals at the session included Tim O'Reilly, Linus Torvalds, Tom Paquin, Jamie Zawinski, Larry Wall, Brian Behlendorf, Sameer Parekh, Eric Allman, Greg Olson, Paul Vixie, John Ousterhout, Guido van Rossum, Philip Zimmermann, John Gilmore and Eric S. Raymond. They used the opportunity before the release of Navigator's source code to clarify a potential confusion caused by the ambiguity of the word "free" in English.
Many people claimed that the birth of the Internet, since 1969, started the open source movement, while others do not distinguish between open-source and free software movements.
The Free Software Foundation (FSF), started in 1985, intended the word "free" to mean freedom to distribute (or "free as in free speech") and not freedom from cost (or "free as in free beer"). Since a great deal of free software already was (and still is) free of charge, such free software became associated with zero cost, which seemed anti-commercial.
The Open Source Initiative (OSI) was formed in February 1998 by Eric Raymond and Bruce Perens. With at least 20 years of evidence from case histories of closed software development versus open development already provided by the Internet developer community, the OSI presented the "open source" case to commercial businesses, like Netscape. The OSI hoped that the use of the label "open source", a term suggested by Christine Peterson of the Foresight Institute at the strategy session, would eliminate ambiguity, particularly for individuals who perceive "free software" as anti-commercial. They sought to bring a higher profile to the practical benefits of freely available source code, and they wanted to bring major software businesses and other high-tech industries into open source. Perens attempted to register "open source" as a service mark for the OSI, but that attempt was impractical by trademark standards. Meanwhile, due to the presentation of Raymond's paper to the upper management at Netscape—Raymond only discovered when he read the press release, and was called by Netscape CEO Jim Barksdale's PA later in the day—Netscape released its Navigator source code as open source, with favorable results.
The Open Source Initiative's (OSI) definition is recognized by governments internationally as the standard or de facto definition. In addition, many of the world's largest open source software projects and contributors, including Debian, Drupal Association, FreeBSD Foundation, Linux Foundation, Mozilla Foundation, Wikimedia Foundation, Wordpress Foundation have committed to upholding the OSI's mission and Open Source Definition through the OSI Affiliate Agreement.
OSI uses The Open Source Definition to determine whether it considers a software license open source. The definition was based on the Debian Free Software Guidelines, written and adapted primarily by Perens. Perens did not base his writing on the "four freedoms" from the Free Software Foundation (FSF), which were only widely available later.
Under Perens' definition, open source describes a broad general type of software license that makes source code available to the general public with relaxed or non-existent restrictions on the use and modification of the code. It is an explicit "feature" of open source that it puts very few restrictions on the use or distribution by any organization or user, in order to enable the rapid evolution of the software.
Despite initially accepting it,Richard Stallman of the FSF now flatly opposes the term "Open Source" being applied to what they refer to as "free software". Although he agrees that the two terms describe "almost the same category of software", Stallman considers equating the terms incorrect and misleading. Stallman also opposes the professed pragmatism of the Open Source Initiative, as he fears that the free software ideals of freedom and community are threatened by compromising on the FSF's idealistic standards for software freedom. The FSF considers free software to be a subset of open source software, and Richard Stallman explained that DRM software, for example, can be developed as open source, despite that it does not give its users freedom (it restricts them), and thus doesn't qualify as free software.
Open-source software licensing
Main article: Open-source license
Further information: Free software license
See also: Free and open-source software § Licensing, and Software license
When an author contributes code to an open-source project (e.g., Apache.org) they do so under an explicit license (e.g., the Apache Contributor License Agreement) or an implicit license (e.g. the open-source license under which the project is already licensing code). Some open-source projects do not take contributed code under a license, but actually require joint assignment of the author's copyright in order to accept code contributions into the project.
Examples of free software license / open-source licenses include Apache License, BSD license, GNU General Public License, GNU Lesser General Public License, MIT License, Eclipse Public License and Mozilla Public License.
The proliferation of open-source licenses is a negative aspect of the open-source movement because it is often difficult to understand the legal implications of the differences between licenses. With more than 180,000 open-source projects available and more than 1400 unique licenses, the complexity of deciding how to manage open-source use within "closed-source" commercial enterprises has dramatically increased. Some are home-grown, while others are modeled after mainstream FOSS licenses such as Berkeley Software Distribution ("BSD"), Apache, MIT-style (Massachusetts Institute of Technology), or GNU General Public License ("GPL"). In view of this, open-source practitioners are starting to use classification schemes in which FOSS licenses are grouped (typically based on the existence and obligations imposed by the copyleft provision; the strength of the copyleft provision).
An important legal milestone for the open source / free software movement was passed in 2008, when the US federal appeals court ruled that free software licenses definitely do set legally binding conditions on the use of copyrighted work, and they are therefore enforceable under existing copyright law. As a result, if end-users violate the licensing conditions, their license disappears, meaning they are infringing copyright. Despite this licensing risk, most commercial software vendors are using open source software in commercial products while fulfilling the license terms, e.g. leveraging the Apache license.
Certification can help to build user confidence. Certification could be applied to the simplest component, to a whole software system. The United Nations University International Institute for Software Technology, initiated a project known as "The Global Desktop Project". This project aims to build a desktop interface that every end-user is able to understand and interact with, thus crossing the language and cultural barriers. The project would improve developing nations' access to information systems. UNU/IIST hopes to achieve this without any compromise in the quality of the software by introducing certifications.
Open-source software development
Main article: Open-source software development model
In his 1997 essay The Cathedral and the Bazaar,open-source evangelistEric S. Raymond suggests a model for developing OSS known as the bazaar model. Raymond likens the development of software by traditional methodologies to building a cathedral, "carefully crafted by individual wizards or small bands of mages working in splendid isolation". He suggests that all software should be developed using the bazaar style, which he described as "a great babbling bazaar of differing agendas and approaches."
In the traditional model of development, which he called the cathedral model, development takes place in a centralized way. Roles are clearly defined. Roles include people dedicated to designing (the architects), people responsible for managing the project, and people responsible for implementation. Traditional software engineering follows the cathedral model.
The bazaar model, however, is different. In this model, roles are not clearly defined. Gregorio Robles suggests that software developed using the bazaar model should exhibit the following patterns:
- Users should be treated as co-developers
- The users are treated like co-developers and so they should have access to the source code of the software. Furthermore, users are encouraged to submit additions to the software, code fixes for the software, bug reports, documentation etc. Having more co-developers increases the rate at which the software evolves. Linus's law states, "Given enough eyeballs all bugs are shallow." This means that if many users view the source code, they will eventually find all bugs and suggest how to fix them. Note that some users have advanced programming skills, and furthermore, each user's machine provides an additional testing environment. This new testing environment offers that ability to find and fix a new bug.
- Early releases
- The first version of the software should be released as early as possible so as to increase one's chances of finding co-developers early.
- Frequent integration
- Code changes should be integrated (merged into a shared code base) as often as possible so as to avoid the overhead of fixing a large number of bugs at the end of the project life cycle. Some open source projects have nightly builds where integration is done automatically on a daily basis.
- Several versions
- There should be at least two versions of the software. There should be a buggier version with more features and a more stable version with fewer features. The buggy version (also called the development version) is for users who want the immediate use of the latest features, and are willing to accept the risk of using code that is not yet thoroughly tested. The users can then act as co-developers, reporting bugs and providing bug fixes.
- High modularization
- The general structure of the software should be modular allowing for parallel development on independent components.
- Dynamic decision making structure
- There is a need for a decision making structure, whether formal or informal, that makes strategic decisions depending on changing user requirements and other factors. Compare with extreme programming.
Data suggests, however, that OSS is not quite as democratic as the bazaar model suggests. An analysis of five billion bytes of free/open source code by 31,999 developers shows that 74% of the code was written by the most active 10% of authors. The average number of authors involved in a project was 5.1, with the median at 2.
Advantages and disadvantages
Open source software is usually easier to obtain than proprietary software, often resulting in increased use. Additionally, the availability of an open source implementation of a standard can increase adoption of that standard. It has also helped to build developer loyalty as developers feel empowered and have a sense of ownership of the end product.
Moreover, lower costs of marketing and logistical services are needed for OSS. OSS also helps companies keep abreast of technology developments. It is a good tool to promote a company's image, including its commercial products. The OSS development approach has helped produce reliable, high quality software quickly and inexpensively.
Open source development offers the potential for a more flexible technology and quicker innovation. It is said to be more reliable since it typically has thousands of independent programmers testing and fixing bugs of the software. Open source is not dependent on the company or author that originally created it. Even if the company fails, the code continues to exist and be developed by its users. Also, it uses open standards accessible to everyone; thus, it does not have the problem of incompatible formats that exist in proprietary software.
It is flexible because modular systems allow programmers to build custom interfaces, or add new abilities to it and it is innovative since open source programs are the product of collaboration among a large number of different programmers. The mix of divergent perspectives, corporate objectives, and personal goals speeds up innovation.
Moreover, free software can be developed in accord with purely technical requirements. It does not require thinking about commercial pressure that often degrades the quality of the software. Commercial pressures make traditional software developers pay more attention to customers' requirements than to security requirements, since such features are somewhat invisible to the customer.
It is sometimes said that the open source development process may not be well defined and the stages in the development process, such as system testing and documentation may be ignored. However this is only true for small (mostly single programmer) projects. Larger, successful projects do define and enforce at least some rules as they need them to make the teamwork possible. In the most complex projects these rules may be as strict as reviewing even minor change by two independent developers.
Not all OSS initiatives have been successful, for example SourceXchange and Eazel. Software experts and researchers who are not convinced by open source's ability to produce quality systems identify the unclear process, the late defect discovery and the lack of any empirical evidence as the most important problems (collected data concerning productivity and quality). It is also difficult to design a commercially sound business model around the open source paradigm. Consequently, only technical requirements may be satisfied and not the ones of the market. In terms of security, open source may allow hackers to know about the weaknesses or loopholes of the software more easily than closed-source software. It depends on control mechanisms in order to create effective performance of autonomous agents who participate in virtual organizations.
In OSS development, tools are used to support the development of the product and the development process itself.
Revision control systems such as Concurrent Versions System (CVS) and later Subversion (SVN) and Git are examples of tools, often themselves open source, help manage the source code files and the changes to those files for a software project. The projects are frequently hosted and published on sites like Launchpad, Bitbucket, and GitHub.
Open source projects are often loosely organized with "little formalised process modelling or support", but utilities such as issue trackers are often used to organize open source software development. Commonly used bugtrackers include Bugzilla and Redmine.
Tools such as mailing lists and IRC provide means of coordination among developers. Centralized code hosting sites also have social features that allow developers to communicate.
Some of the "more prominent organizations" involved in OSS development include the Apache Software Foundation, creators of the Apache web server; the Linux Foundation, a nonprofit which as of 2012[update] employed Linus Torvalds, the creator of the Linux operating systemkernel; the Eclipse Foundation, home of the Eclipse software development platform; the Debian Project, creators of the influential Debian GNU/Linux distribution; the Mozilla Foundation, home of the Firefox web browser; and OW2, European-born community developing open source middleware. New organizations tend to have a more sophisticated governance model and their membership is often formed by legal entity members.
Open Source Software Institute is a membership-based, non-profit (501 (c)(6)) organization established in 2001 that promotes the development and implementation of open source software solutions within US Federal, state and local government agencies. OSSI's efforts have focused on promoting adoption of open source software programs and policies within Federal Government and Defense and Homeland Security communities.
Open Source for America is a group created to raise awareness in the United States Federal Government about the benefits of open source software. Their stated goals are to encourage the government's use of open source software, participation in open source software projects, and incorporation of open source community dynamics to increase government transparency.
Mil-OSS is a group dedicated to the advancement of OSS use and creation in the military.
Main article: Business models for open-source software
Open-source software is widely used both as independent applications and as components in non-open-source applications. Many independent software vendors (ISVs), value-added resellers (VARs), and hardware vendors (OEMs or ODMs) use open-source frameworks, modules, and libraries inside their proprietary, for-profit products and services. From a customer's perspective, the ability to use open technology under standard commercial terms and support is valuable. They are willing to pay for the legal protection (e.g., indemnification from copyright or patent infringement), "commercial-grade QA", and professional support/training/consulting that are typical of commercial software, while also receiving the benefits of fine-grained control and lack of lock-in that comes with open-source.
Comparisons with other software licensing/development models
Closed source / proprietary software
Main article: Comparison of open source and closed source
The debate over open source vs. closed source (alternatively called proprietary software) is sometimes heated.
The top four reasons (as provided by Open Source Business Conference survey) individuals or organizations choose open source software are:
- lower cost
- no vendor 'lock in'
- better quality
Since innovative companies no longer rely heavily on software sales, proprietary software has become less of a necessity. As such, things like open source content management system—or CMS—deployments are becoming more commonplace. In 2009, the US White House switched its CMS system from a proprietary system to Drupal open source CMS. Further, companies like Novell (who traditionally sold software the old-fashioned way) continually debate the benefits of switching to open source availability, having already switched part of the product offering to open source code. In this way, open source software provides solutions to unique or specific problems. As such, it is reported that 98% of enterprise-level companies use open source software offerings in some capacity.
With this market shift, more critical systems are beginning to rely on open source offerings, allowing greater funding (such as US Department of Homeland Security grants) to help "hunt for security bugs." According to a pilot study of organisations adopting (or not adopting) OSS; several factors of statistical significance were observed in the manager's beliefs in relation to (a) attitudes toward outcomes, (b) the influences and behaviours of others and (c) their ability to act.
Proprietary source distributors have started to develop and contribute to the open source community due to the market share shift, doing so by the need to reinvent their models in order to remain competitive.
Many advocates argue that open source software is inherently safer because any person can view, edit, and change code. A study of the Linux source code has 0.17 bugs per 1000 lines of code while proprietary software generally scores 20–30 bugs per 1000 lines.
Main article: Alternative terms for free software
See also: Comparison of free and open-source software licenses
According to the Free software movement's leader, Richard Stallman, the main difference is that by choosing one term over the other (i.e. either "open source" or "free software") one lets others know about what one's goals are: "Open source is a development methodology; free software is a social movement." Nevertheless, there is significant overlap between open source software and free software.
The FSF said that the term "open source" fosters an ambiguity of a different kind such that it confuses the mere availability of the source with the freedom to use, modify, and redistribute it. On the other hand, the "free software" term was criticized for the ambiguity of the word "free" as "available at no cost", which was seen as discouraging for business adoption, and for the historical ambiguous usage of the term.
Developers have used the alternative termsFree and Open Source Software (FOSS), or Free/Libre and Open Source Software (FLOSS), consequently, to describe open source software that is also free software. While the definition of open source software is very similar to the FSF's free software definition it was based on the Debian Free Software Guidelines, written and adapted primarily by Bruce Perens with input from Eric S. Raymond and others.
The term "open source" was originally intended to be trademarkable; however, the term was deemed too descriptive, so no trademark exists. The OSI would prefer that people treat open source as if it were a trademark, and use it only to describe software licensed under an OSI approved license.
OSI Certified is a trademark licensed only to people who are distributing software licensed under a license listed on the Open Source Initiative's list.
Open-source versus source-available
Although the OSI definition of "open source software" is widely accepted, a small number of people and organizations use the term to refer to software where the source is available for viewing, but which may not legally be modified or redistributed. Such software is more often referred to as source-available, or as shared source, a term coined by Microsoft in 2001. While in 2007 two shared source licenses were certified by the OSI, most of the shared source licenses are still source-available only.
In 2007 Michael Tiemann, president of OSI, had criticized companies such as SugarCRM for promoting their software as "open source" when in fact it did not have an OSI-approved license. In SugarCRM's case, it was because the software is so-called "badgeware" since it specified a "badge" that must be displayed in the user interface (SugarCRM has since switched to GPLv3). Another example was Scilab prior to version 5, which called itself "the open source platform for numerical computation" but had a license that forbade commercial redistribution of modified versions.
Open-sourcing is the act of propagating the open source movement, most often referring to releasing previously proprietary software under an open source/free software license, but it may also refer programing Open Source software or installing Open Source software.
Notable software packages, previously proprietary, which have been open sourced include:
Before changing the license of software, distributors usually audit the source code for third party licensed code which they would have to remove or obtain permission for its relicense. Backdoors and other malware should also be removed as they may easily be discovered after release of the code.
Current applications and adoption
Main article: Free and open-source software § Adoption
See also: Linux adoption and Free software § Adoption
"We migrated key functions from Windows to Linux because we needed an operating system that was stable and reliable – one that would give us in-house control. So if we needed to patch, adjust, or adapt, we could."
Widely used open-source software
Main article: List of free and open-source software packages
Open source software projects are built and maintained by a network of volunteer programmers and are widely used in free as well as commercial products. Prime examples of open-source products are the Apache HTTP Server, the e-commerce platform osCommerce, internet browsers Mozilla Firefox and Chromium (the project where the vast majority of development of the freeware Google Chrome is done) and the full office suite LibreOffice. One of the most successful open-source products is the GNU/Linux operating system, an open-source Unix-like operating system, and its derivative Android, an operating system for mobile devices. In some industries, open source software is the norm.
Extensions for non-software use
Main article: Open source model
See also: Open content and Open collaboration
While the term "open source" applied originally only to the source code of software, it is now being applied to many other areas such as Open source ecology, a movement to decentralize technologies so that any human can use them. However, it is often misapplied to other areas which have different and competing principles, which overlap only partially.
The same principles that underlie open source software can be found in many other ventures, such as open-source hardware, Wikipedia, and open-access publishing. Collectively, these principles are known as open source, open content, and open collaboration: "any system of innovation or production that relies on goal-oriented yet loosely coordinated participants, who interact to create a product (or service) of economic value, which they make available to contributors and non-contributors alike."
This "culture" or ideology takes the view that the principles apply more generally to facilitate concurrent input of different agendas, approaches and priorities, in contrast with more centralized models of development such as those typically used in commercial companies.
- ^St. Laurent, Andrew M. (2008). Understanding Open Source and Free Software Licensing. O'Reilly Media. p. 4. ISBN 9780596553951.
- ^ abLevine, Sheen S.; Prietula, Michael J. (2013-12-30). "Open Collaboration for Innovation: Principles and Performance". Organization Science. 25 (5): 1414–1433. doi:10.1287/orsc.2013.0872. ISSN 1047-7039.
- ^"What is open source?". opensource.com. Retrieved 25 August 2017.
- ^"Open Source Initiative". opensource.org. Opensource.org. Retrieved 25 August 2017.
- ^Hoffman, Chris (2016-09-26). "What Is Open Source Software, and Why Does It Matter?". howtogeek.com. Retrieved 25 August 2017.
- ^Rothwell, Richard (5 August 2008). "Creating wealth with free software". Free Software Magazine. Archived from the original on 8 September 2008. Retrieved 8 September 2008.
- ^"Standish Newsroom — Open Source" (Press release). Boston. 16 April 2008. Archived from the original on 18 January 2012. Retrieved 8 September 2008.
- ^Gaudeul, Alexia (2007). "Do Open Source Developers Respond to Competition? The LaTeX Case Study". Review of Network Economics. 6 (2). doi:10.2202/1446-9022.1119. ISSN 1446-9022.
- ^ abcKarl Fogel (2016). "Producing Open Source Software - How to Run a Successful Free Software Project". O'Reilly Media. Retrieved 2016-04-11.
- ^"History of the OSI". Opensource.org.
- ^B. Charny (3 May 2001). "Microsoft Raps Open-Source Approach,". CNET News.
- ^Jeffrey Voas, Keith W. Miller & Tom Costello. Free and Open Source Software. IT Professional 12(6) (November 2010), pg. 14–16.
- ^Eric S. Raymond. "Goodbye, "free software"; hello, "open source"". catb.org.
- ^Kelty, Christpher M. (2008). "The Cultural Significance of free Software - Two Bits"(PDF). Duke University press - durham and london. p. 99.
- ^Shea, Tom (1983-06-23). "Free software - Free software is a junkyard of software spare parts". InfoWorld. Retrieved 2016-02-10.
- ^Raymond, Eric S. (1998-02-08). "Goodbye, "free software"; hello, "open source"". Retrieved 2008-08-13.
- ^"Open Standards, Open Source Adoption in the Public Sector, and Their Relationship to Microsoft's Market Dominance by Tony Casson, Patrick S. Ryan :: SSRN". Papers.ssrn.com. SSRN 1656616.
- ^Holtgrewe, Ursula (2004). "Articulating the Speed(s) of the Internet: The Case of Open Source/Free Software". Time & Society. 13: 129–146. doi:10.1177/0961463X04040750.
- ^"Open Source Pioneers Meet in Historic Summit". 1998-04-14. Retrieved 2014-09-20.
- ^Muffatto, Moreno (2006). Open Source: A Multidisciplinary Approach. Imperial College Press. ISBN 1-86094-665-8.
- ^"NETSCAPE ANNOUNCES PLANS TO MAKE NEXT-GENERATION COMMUNICATOR SOURCE CODE AVAILABLE FREE ON THE NET". Netscape Communications Corporation. 1998-01-22. Archived from the original on 2007-04-01. Retrieved 2013-08-08.
Brazil has always stood out on the global scene for its advanced know-how in the production of biofuels, and was the second-largest producer of biodiesel in 2010 and the biggest global consumer in 2011 . The first experiments on the use of ethanol in Otto cycle engines date back to the beginning of the 20th century. Although studies on biofuels in Brazil started long ago, it was only in the 21th century that the country put into action a plan to produce biodiesel on a large scale, taking advantage of the experience acquired with the Pro-Alcohol Program. With the intent to broaden the Brazilian energy matrix, in 2004, the Federal Government launched the National Program of Biodiesel Production and Use (PNPB).
Biodiesel is defined by the National Petroleum Agency (ANP), through Government Directive N° 255, of 15 September 2003, as a compound fuel derived from vegetable oils or animal fats, called B100 . It can be used in pressure-ignited internal combustion engines or for other types of energy generation and can partially or totally replace fossil fuels. Therefore, there are wide possibilities to use biodiesel in urban, road and rail transportation, for the generation of energy, in stationary engines, and others.
Brazil enjoys a privileged position compared to other countries, due to its biodiversity and vast territorial area, able to facilitate the cultivation of distinct species in every region. Consequently, the raw materials for the production of biodiesel can be selected in accordance with their availability in each region throughout the country . Among the sources stand out among them are oilseeds, like cotton, peanut, dendê (palm oil), sunflower, castor bean, barbados nut and soybean [4–6]. Besides the privileged location, two other factors drive Brazil's biodiesel production. The first is the amount of arable land available and the second is the abundance of water resources. According to the Ministry of Agriculture, just considering the new areas that could be destined for the production of oilseeds, they would amount to approximately 200 million hectares .
Currently, soybean oil is the most used vegetable raw material for making biodiesel in Brazil, with an average share of 78% in the production of this fuel, followed by cotton oil, with approximately a 4-percent share. The remainder includes animal fats, and other oily materials . Notwithstanding soybean oil's status as most important raw material, in terms of volume, in the production of biodiesel, the Federal Government has been encouraging the development of other oilseed crops, particularly the ones linked with family farming operations. Furthermore, depending on only one crop as major supplier of raw material of an important national energy autonomy project might turn it unsustainable, as it would promote the economic development only (or mainly) of regions where climate and geological characteristics are favorable, whilst keeping the project at the mercy of economic pressures from one production chain only. Similar problems surfaced in the development of the Pro-Alcohol Program in the 1970s.
In this sense, the Ministry of Agriculture, Livestock and Food Supply (MAPA) has been assisting the farmers with crop management practices, providing them with cultivars for the production of biodiesel. In line with this work, the Brazilian government encourages the production of biodiesel from different oilseeds and technological nuances, inviting the participation of agribusiness and family farming operations . Likewise, federal decrees define the taxation rules, which can vary according to planting region, raw material or production category, with distinct tax rates levied on agribusiness and family farming, where the latter is a priority of the program. Another factor that leads to the cultivation of several oilseed crops is easy access to bank loans and reduced interest rates, besides the obligation of the biodiesel producing companies to acquire 5% of their raw material from family farmers. Besides the incentive for the production of biofuels, aligned with the economic development brought about by the production of the oilseeds, the adoption of a quality control program is essential for the identification of the different vegetable oil sources of these biofuels.
This need becomes even more relevant as there are soaring financial attractions for the production of alternative biofuels from renewable sources, in which a diversity of fuel formulations is (or could be) available in the market. This would also inhibit the use of raw materials and the production of biodiesel without the authorization of the regulating organ.
Nevertheless, few studies with the aim to identify a vegetable oil source utilized in the production of biofuels exist. With the incentives of the federal government, now encouraging the use of new raw materials for the production of biodiesel, it is necessary to identify their source and, to this end, there is a need to resort to methodologies that make it possible to identify a vegetable oil source. With regard to chemistry, vegetable oils of distinct sources present a different fatty acids chemical compositions. They differ with regard to the length of the chain, the degree of saturation or the presence of other chemical functions , properties that can all be identified through spectrometric techniques [9–14].
A major reason for characterizing its source is related to inspection, as some countries rely on different policies depending on the raw material. Another reason is related to the specific physical-chemical properties of every different vegetable oil and their relation with correct application. Within this context, besides the development of research towards making it technically and economically viable to use other raw materials for the production of biodiesel, it becomes evident (or consequent) that it is necessary to develop analytical techniques to make it possible to identify the vegetable oil source utilized in the production of biodiesel.
Multivariate analyses have recently made possible modeling of chemical and physical properties of simple and complex systems from spectroscopic data. Recent works using near infrared (NIR) spectroscopy, and multivariate analysis for biodiesels in order to identify which vegetable oils are used in production were investigated. Principal component analysis (PCA), and hierarchical cluster analysis (HCA) were used for unsupervised pattern recognition while soft independent modelling of class analogy (SIMCA), was used for supervised pattern recognition . In another work four different multivariate data analysis techniques are used to solve the classification problem, including regularized discriminant analysis (RDA), partial least squares method/projection on latent structures (PLS-DA), K-nearest neighbors (KNN) technique, and support vector machines (SVMs). Classifying biodiesel by feedstock (base stock) type can be successfully solved with modern machine learning techniques and NIR spectroscopy data . Also two classification methods are compared, namely full-spectrum soft independent modelling of class analogy (SIMCA) and linear discriminant analysis with variables selected by the successive projections algorithm (SPA-LDA) .
In the other hand, qualitative and quantitative analysis using spectroscopy in the infrared region expanded from the time when the data generated by a FT-IR spectrophotometer could only be scanned, enabling statistical methods to solve problems of chemical analysis [17–21]. In HCA the spectra data matrix is reduced to one dimension, by matching similar pairs, until all points in a single group are matched. The goal of HCA is to display the data in a two-dimensional space in order to emphasize their natural groupings and patterns. The distance between the points (samples and variables) reflects the similarity of their properties, so the closer the points in the sample space, the more similar they are. Results are presented as dendrograms, which samples or variables are grouped according to similarity. In PCA the n-dimensional data is designed into a low-dimensional space, usually two or three. This is done by calculating the principal components obtained by making linear combinations of original variables. In a principal component analysis, clustering of samples defines the structure of data through graphs of scores and loadings, whose axes are principal components (PCs) in which data are designed [22–24]. The iPCA analysis consists of dividing the data set into a number of equidistant intervals. For each interval a PCA is performed, and the results are shown in charts of scores. This method is intended to give an overview of the data and may be useful in the interpretation of signs which are more representative of the spectrum to build a good model for multivariate calibration [25–27]. In SIMCA, there is a training set which is modeled by principal component analysis (PCA). Subsequently, new samples are fitted to the model. Test samples are classified as similar or dissimilar [23,28].
2. Experimental Section
2.1. Materials and Methods
Were used six different vegetable oil sources: canola, cotton, corn, palm, sunflower and soybean. For the latter two, two samples of each oil from different sources were acquired. A two-letter code was used to identify the samples. The first letter specifies if the oil sample is degummed (O) or biodiesel (B), the second letter specifies which vegetable oil source was utilized (for example, C = Canola) and the code that comes next to letter identification represents the analysis reproduction number. Finally, the small letter (a or b) identifies the origin of the sample. The biodiesel samples were produced from samples of degummed oils. From the cotton oil sample two batches were produced and from the soybean sample (b) three batches of biodiesel were produced. This procedure was adopted with the purpose to guarantee the method reproducibility. The canola and sunflower biodiesel batches were acquired from the biodiesel pilot plant of the University of Santa Cruz do Sul—UNISC, in Rio Grande do Sul, Brazil.
The methylation route was used to produce the biodiesel via transesterification. Sodium methoxide (Rodhia) was used as catalyst, and as reagent, methyl alcohol (Vetec, P.A) at a 1:6 molar rate . The biodiesel samples were characterized through methods standardized by the AOCS Physical and Chemical Characteristics of Oils, Fats, and Waxes and European Norm (EN) by the following parameters and respective methods: moisture (AOCS Ca2e-84), acidity rate (AOCS Ca5a-40), total glycerol (EN 14105), free glycerol (AOCS Ca14-5) and methanol (EN 14110).
2.2. Acquisition of Spectra in the Medium Infrared
The infrared spectra were acquired on a Perkin Elmer model Spectrum 400 FTIR Spectrometer, based on a Universal Attenuated Total Reflectance sensor (UATR-FTIR). A range from 4,000 to 650 cm−1 was scanned, with a resolution of 4 cm−1 and 32 scans. The crystal utilized in this technique, contains diamond in its upper layer and a zinc selenide focusing element. The spectra of each sample were acquired with six replicates. Later, they were normalized, in order to eliminate the differences in intensity stemming from concentration variations, reducing external effects in the same order of magnitude, and all of them varying within an intensity range from 0 to 1 .
2.3. Multivariate Data Analysis
All obtained spectra were treated by multivariate analysis tools, using the Hierarchical Cluster Analysis (HCA) and the Principal Components Analysis (PCA) and the Soft Independent Modeling of Class Analogy (SIMCA), through the computer program Pirouette® 3.11 by Infometrix (Bothell, WA, USA). Interval Principal Component Analysis (iPCA) from the software Matlab® 7.11.0 (The Math Works, Natick, MA, USA) was also employed, using the iToolbox package ( http://www.models.kvl.dk, Copenhagen, Denmark).
2.4. Modeling of Biodiesel Batches in the Medium Infrared
The set of raw spectra of biodiesel samples are shown in Figure 1. To remove noise the spectra were then treated using the Savitzky–Golay first derivative procedure with a second-order polynomial and a 15-point window. Mean centered data and Standard Normal Variate (SNV) were used as pre-processing tools for multivariate analysis .
2.4.1. PCA and HCA
In the PCA and HCA, the 735–1,783 and 2,810–3,035 cm−1 regions were selected because the other regions contained no spectral information or were polluted by water vapor or carbon dioxide bands due to poor compensation. For obtaining the HCA dendrogram, the Euclidian distance and the incremental connection method were used. In Figure 2, one can observe the spectra of samples of biodiesel with the application of the first derivative and the SNV. The regions of the spectra that were excluded are highlighted.
2.4.2. Interval Principal Component Analysis (iPCA)
The objectives of the results obtained at the Interval Principal Component Analysis (iPCA) consisted in detecting the spectral region where there is the best separation of the different samples of biodiesel with the intent to utilize it later in the SIMCA classification method. The spectra were split into 8, 16, 32 and 64 equidistant regions, while the combination of results between the principal components: PC1 versus PC2, PC1 versus PC3 e PC2 versus PC3, was also evaluated.
2.4.3. Soft Independent Modeling of Class Analogy (SIMCA)
Once the best spectral region was obtained with the iPCA algorithm, the SIMCA model was built using of the biodiesel spectra data. The SIMCA model built was in accordance with the data in Table 1.
3. Results and Discussion
3.1. Characterization of the Biodiesel Batches
The results from the characterization of the biodiesel samples are shown on Table 2.
3.2. Joint Analysis between the Biodiesel and the Degummed Oil Samples
Through the PCA, it was observed that 93.73% of data variances were explained by the analysis of the two principal components. Figure 3 shows PCA scores plot (PC1 versus PC2) obtained from UATR/FTIR data. PC1 separates the biodiesel samples, with positive values, from the degummed oil samples, in negative values on the scores chart. On the other hand, PC2, in turn, manages to separate both the biodiesel samples and the samples of palm and cotton degummed oils, in positive values, from the samples of biodiesel and samples of soybean, sunflower, canola and corn degummed oils, in negative values on the scores chart.
Although the samples of degummed oils and the samples of biodiesel are on opposite sides in Figure 3, it is clear that the vegetable oil source exerts an influence on the PC2 of these samples, for example, by observing the samples of biodiesel and the samples of degummed palm and cotton oils, it is ascertained that they are located approximately at the same height of the PC2 axis, though on opposite sides. The same thing also occurs with the other samples. The trends observed through analyses of the principal components were confirmed through the dendrogram obtained by HCA (Figure 4).
In this dendrogram one can observe the presence of two clusters, one associated with the biodiesel samples and the other associated with the degummed oil samples. The results achieved in the dendrogram are totally in line with the results achieved on the PCA scores plot (PC1 versus PC2).
3.3. Interval Principal Component Analysis (iPCA)
The best results at the iPCA were achieved with the two first principal components (PC1 versus PC2) and splitting the spectrum into 16 equidistant intervals. Figure 5 features the percentage variance chart for every region studied. In this chart, for each interval, that is to say, for each region of the spectrum, variance is calculated, in percentage terms, for each principal component. It should be added that the bars present in percentage form (height of the bars) the variance contained in each main component for each interval. In this figure, interval 14 accumulates 99.54% of information in the first two principal components for the UATR-FTIR spectra data.
The spectral region from 1,300–900 cm−1 is referred to as the fingerprint, as it confirms the identity of compounds. Within this range, the most important absorptions are the ones stemming from the stretching of the C–O bond of the esters. These absorption ranges of the ester C–O bonds, actually correspond to two asymmetric vibrations that involve the bonds C–C and C–O. In the case of saturated aliphatic esters, the two bands observed appear at 1,275–1,185 cm−1 and at 1,160–1,050 cm−1. The first involves the bond stretching between the oxygen and the carbonyl carbon, coupled with C–C stretching. The second involves the bond stretching between the oxygen atom and a carbon atom. The band that occurs in the biggest number of waves is usually the more intense of the two .
The spectral region where the best separation of biodiesel samples in the UATR-FTIR spectra data was achieved includes the range of 1,276 to 1,068 cm−1, regarding interval 14, which can be visualized in Figure 6.
Figure 6 presents differences between soybean and sunflower samples. It is observed that the batches of soybean A and B are not in the same group and, consequently, they present differences in their chemical composition. This is justified by the characterization data of the biodiesel samples shown on Table 2