Portuguese note: Aqui segue a entrevista com o Sr. Josh Berkus no original em inglês. Veja a versão traduzida para o pt_BR aqui.
Here is my first interview with PostgreSQL developers. I want to talk with many core developers, major developers, developer from Brazil, and developers of interesting PostgreSQL related projects. By the way this is just the first, but You could expect at least one per month.
Josh Berkus is one of the seven-member PostgreSQL Core Team, which controls releases for the PostgreSQL Project. He is also PostgreSQL Lead for Sun Microsystems, and has been a database consultant for over a decade. Josh is also a very good cook. Please note that Josh speaks only for himself and not for Sun Microsystems or the PostgreSQL project.
1 – When You start to work with computers and when You start to use FLOSS?
Oh, I started working with computers back in 1981 … with the Commodore Vic20. In high school I had an Atari 8-bit computer, and hacked the hardware extensively, as well as doing some game programming in Basic and assembler.
I started with Free Software in 1998. At the time I was a small business and non-profit consultant building database applications, and was unable to tolerate Microsoft’s rising prices and increasing number of bugs — I realized if I continued to use Microsoft, I’d go bankrupt, crazy, or both. So I switched from MS SQL Server and Visual Basic to PostgreSQL 7.0, Linux 2.2 and PHP 3. Later I picked up some Perl.
One of the clients running my first PostgreSQL application is still running it (an employment agency). One of my Microsoft platform clients is, too … but they had to hire a full-time employee to maintain the Visual Basic software.
That’s also when I got involved in the community. During the same month, I discovered bugs in both MS SQL Server 7 and PostgreSQL 7.1. My posting to comp.databases.microsoft.sqlserver was silently deleted from the MSDN forums, and the bug was never fixed. In PostgreSQL, Tom Lane (who’d joined the project only three months before) confirmed the bug within hours and provided me with an experimental patch. From then I was hooked.
2 – How do You see the market of Free and Proprietary Database?
All SQL databases will be “commoditized” within the next 10 years. That is, I expect by 2015 that there will be four kinds of databases in use: open source, shareware and freeware (like DB2 express), low-cost and specialized (like Filemaker and Pervasive), and legacy databases no longer being maintained.
This is mostly because the requirements of SQL-relational database servers are well understood and well-defined, and most existing products already satisfy more than 50% of the market in terms of features. That means that the only real competition is downward pressure on prices, and we’ve seen that a lot lately, both in the growing popularity of MySQL and PostgreSQL, and the free versions from the major proprietary vendors.
3 – With read do You recommend for those are starting to use databases?
“Database Design for Mere Mortals” and “SQL Queries for Mere Mortals” are good for just starting out. After that, I’d suggest reading these books in any order: Korry Douglas’ “PostgreSQL” “Practical Issues in Database Design” “SQL for Smarties” And, of course, don’t forget to read the PostgreSQL documentation!
I’ve also heard good things about O’Reily’s “SQL Cookbook” and “SQL Hacks”, as well as “The Practical SQL Handbook”. But I haven’t read them.
4 – You was OpenOffice.org Developer and become a PostgreSQL devel in 2002. What stimulate you to switch projects?
I got involved with OpenOffice.org because I felt that, in 2000, it was critical for the success of open source and Linux that we have good office applications. Otherwise people could never switch from Windows — including me!
By 2002, OpenOffice.org had become very popular and had millions of downloads. But I was a consultant and did not earn any money using OpenOffice.org, while I was doing a lot of PostgreSQL work. Also, OpenOffice.org project had a lot of politics which were burning out many of the volunteers. So I switched projects.
5 – You’ve become a treasurer of SPI in 2006. What kind of relation did you expect between SPI and PostgreSQL?
It’s really just a way for us to raise funds. SPI is actually only one of five non-profits which hold funds for PostgreSQL. The others are the Japanese PostgreSQL Users’ Group, PostgreSQLFr.org, FFIS (Germany), and PostgreSQL Turkey.
Of course, my personal involvement with SPI is much greater than that, but that’s really separate from PostgreSQL.
6 – SPI recommends the use of free licences like GPL. Historical reasons make PostgreSQL adopts BSD licences. Do You recommended other licence for projects like PostgreSQL? In witch projects is recommended adopt the copyleft concept?
Actually, I personally prefer the BSD license, and have no intention to advocate a change for PostgreSQL.
In my personal opinion, the GPL is best for projects where the danger of companies appropriating code without contributing back is significant, and the desire to create a new standard is limited. BSD-like licenses (including MIT and Apache) are more appropriate where getting the software distributed as far as possible is paramount, and the project already enjoys substantial loyal corporate support from large companies. I wouldn’t want Linux to switch to BSD, either, because there’s already a demonstrated danger of companies forking and closing off the code.
7 – SUN have been supporting a lot the devel of PostgreSQL. Do you think Solaris platform may become one of the preferred platform to PostgreSQL?
Well, of course it would be very good for my job if it did!
In the immediate future, I’m just shooting to put Solaris and OpenSolaris on a par with Linux and BSD in the PostgreSQL community. There’s a lot still to be done before that can happen; we need to regularly get releases out within 72 hours of the community, there needs to be more documentation on running PostgreSQL on Solaris, and more optional tools need to be available as Solaris binaries. Finally, the community needs to build up the performance tuning knowledge about Solaris that we have with other OSes.
In the longer term, OpenSolaris offers some tools and a development orientation that I believe will greatly expand the number of PostgreSQL+Solaris users once we have the routine stuff happening automatically. DTrace, with probes included in PostgreSQL 8.2, is a paramount example of these tools; it’s provided with information on lock conflicts that we didn’t have any easy way to get before. Also, the Solaris community, like PostgreSQL, is dedicated to security, sustainability, and doing the correct thing rather than just the expedient one. So I think there’s a lot of opportunity for synergy once Solaris and PostgreSQL developers get talking to each other.
8 – Do you believe that tools included in 8.2 version, like a DTrace may adopted for others free OSes, like Linux and BSD’s?
DTrace is already in FreeBSD 6.2, and Apple has stated it will make it into a future version of OSX. The Linux developers are working on a parallel utility called “systemtap”, and we created the Generic Monitoring Framework feature of PostgreSQL 8.2 so that it will be compatible with systemtap and other trace frameworks when they are available.
9 – You worked on devel of Bizgres? This project, now, find in what status and what could we expect from it in next years?
Bizgres is pretty much a testing ground for data warehousing features for PostgreSQL. A lot of stuff planned for later versions of PostgreSQL will go into Bizgres first: bitmap-on-disk indexes, for example, which will appear in PostgreSQL 8.3. I know that the Bizgres hackers (mostly Greenplum staff) are also working on resource management, external attached tables, and index-only access, but there’s no definite schedule for those things … yet.
10 – We notice that your site, “Power PostGreSQL” don’t receive any article for long time. In the same time, we notice that site have a new layout. Can we expect new features for next months?
By the time this article is published, I expect that all of my talks from the last year of conferences will be up on PowerPostgreSQL. I’m blogging on ITToolBox (http://blogs.ittoolbox.com/database/soup/) instead of on PowerPostgreSQL because ITToolBox helps me reach a wider audience.
11- the idea of write a book from this stills remain?
Well, Joe and I have both found that, after submitting the first four chapters, finding the time to write a book is very, very hard. I’d estimate that the completed book would require about 800 hours more writing. Unfortunately, the only people capable of taking over the book from us are as busy as we are, so there’s no delivery date in sight.
12 – PosgreSQL have been maturing some replication solutions. Do You believe that in a near future will be cluster tools for PostgreSQL, like in Oracle, DB2 and Teradata? Which dificulties this kind of project have?
Well, we already have a Teradata replacement, which is Greenplum Database. While it’s not open source, it is based on PostgreSQL and is syntax-compatible. There are lower-performance parallel-query solutions already available such as PostgresForest and Octopus.
OLTP clustering is a much, much harder problem and not one which *any* database project or company has solved satisfactorily: not Oracle, not IBM, not MySQL, not anybody. So putting any kind of a date on when any of our OLTP clustering projects will be suitable for a majority of users would be very ambitious at this point.
13 – What kind of knowledge a wannabe PostgreSQL developer may have?
For hacking PostgreSQL: some database design background and a good knowledge of C programming. For add-in projects on pgFoundry you can use whatever programming language you like. More importantly, you need to have some humility about your code and a desire to work with a community of hackers and learn how the project does things; a programmer who thinks that they are smarter than everyone else will not be very successful.
14 – What features You imagine we have implemented in PostgreSQL until 2012
Heck, I couldn’t tell you what we’ll have in 2009, let alone 2012. The project moves very, very fast. We do a major release every year and it’s hard to see accurately more than one release ahead, if that.
First, I can tell you some of the stuff I’m working on at Sun. This includes a second generation of pgMigrator which will make for easy update-in-place for later versions of PostgreSQL (pgMigrator, contributed by EnterpriseDB, currently handles only 8.1 to 8.2). We’re working on that with the programmers from Greenplum. Second, I’ll be working on autotuning/auto-configuration tools to help users get good PostgreSQL peformance without having to be expert DBAs. That’s liable to be a long-term project which will eventually link into our new DTrace probes. Finally, we’ve been sponsoring Dave Cramer’s work on performance improvements to the JDBC driver, which has had some really gratifying results.
In the medium run, here’s a some things I’m relatively sure we’ll get in the next three versions or so, because there’s people actively working on them:
- Update-in-place optimizations which enhance OLTP performance from 30% to 200%.
- More SQL99 and SQL03 features, like WITH RECURSIVE, ROLLUP and RANK.
- Polished external tables interfaces (possibly SQL/MED compliant), both for other PostgreSQL servers/versions and for other DBMSes.
- More exotic datatypes, especially more powerful SQL:XML features, and probably biology types as well.
- Yet more query optimizer improvements.
- The elimination of vaccuum. I’m not sure how we’re going to do it, but we must.
- Both asyncronous and syncronous multi-master replication, from different projects.
- Publication of TPC and Spec benchmarks