[ossig] FOSS.in Conference Day 2

Ditesh Kumar ditesh at gathani.org
Tue Dec 18 12:31:01 MYT 2007


(for full linkage, please visit:
http://ditesh.gathani.org/blog/2007/12/13/fossin-conference-day-2/ )


--- Union Mount: VFS based Filesystem Namespace Unification for Linux

Bharata B. Rao from IBM spoke on union mount, a filesystem namespace
unification (ie, the concept of merging the contents of two or more
directories/filesystems to present a unified view). Some uses of Union
mount include:

      * live CD systems (writable RAM based FS combined with a read only
        FS on CD, thus allowing a writable disk-less system)
      * server consolidation, many servers sharing a common RO
        installation
      * disk-less NFS-root clients (set of machines sharing a single RO
        NFS root filesystem)
      * sandboxing - simulation of software updates, testing of OS
        updates

The Union file system also offers unification at filesystem layer, so
EXT3 and ReiserFs can be abstracted through a Union file system. File
systems offer a namespace (hierarchical view of the filesystem contents)
and mounting (adding the filesystem in the device to the namespace
tree).

Some examples of transparent mounts include the following:

mount /dev/sda1 /mnt
mount -union /dev/sda2 /mnt

So, /mnt becomes the union mount point of sda1 and sda2, sda2 becomes
the topmost writable layer and sda1 is the RO bottom layer of the union.

For directory listings (readdir), directory contents from different
mount points are merged with only top layer file being shown. Same named
directories are merged again. For file/directory lookups, the lookup
starts with topmost directory and proceeds downwards. It stops and
returns when the required file is found. Otherwise, it descends into all
lower layers in case of directories to create subdirectory level unions.

For RO lower layers, all but the topmost layer are RO immutable layers.
A write to a lower layer file results in the file getting copied to
topmost layer and write being performed on the copy. Whiteouts are place
holders for files that don’t exist logically and a deletion of a lower
level only file/directory creates a whiteout for it in the topmost
directory. Whiteout lookup return an -ENOENT.

For file renames, for files/directories present only in the topmost
layer, traditional rename is used. The rename of a directory which is a
part of a union or which is present only in the lower layer is deferred
to userspace by returning -EXDEV. The renaming of a regular file present
only in the lower layer is done by copying it up to the topmost layer.

There was a vigorous Q&A session, which I could not copy quickly enough.
Needless to say, there was much interest from audience members.

Personal thoughts: A fairly lucid technical talk.


--- MySQL and the architecture of participation

Colin Charles spoke on MySQL and getting involved. Two years ago, MySQL
had no contribution mechanism and working at MySQL was, for all intents
and purposes, like working at a startup. MySQL has been open source
since early 2000, GPLv2 even. However, development has been relatively
closed as developers to MySQL are usually immediately hired. In
addition, code reviews are performed in secret and there are legal
hurdles to getting external contributors involved.

MySQL sought change, at some point, and opened up to getting the
community involved, including having its bugs database, mailing list,
forums available publically. The developer zone (devzone) was the most
immediately useful of the sites, with downloads, necessary
documentation, articles and the like. Apparently, MySQL devzone still
has some marketing in it, so it doesn’t serve developers as well as it
should.

MySQL Forge is a SourceForge/Freshmeat equivalent tailored for projects
that use MySQL. It allows for sharing of SQL snippets, stored
procedures, UDF and also provides a wiki. Much of the MySQL internals
documentation has been moved to the wiki (this includes localized
documentation too).

Next up was the Quality Contribution Program (QCP), a MySQL effort to
improve the MySQL product base by getting active community
participation. Rewards for community members include acknowledgement and
possibly rewards. Participation is counted as activity in bug
hunting/reports, test cases, patches in the last 12 months.

Particularly interesting was “Worklogs”, effectively development and
roadmap tasks for MySQL. It basically describes features that MySQL
developers are working on, with the ability for users to provide
feedback on features being developed. Very cool!

For development and versioning purposes, MySQL still uses the
proprietary BitKeeper. However, all of MySQL trees are public
(http://mysql.bkbits.net) and should be up to date. Colin gave an
overview of checking out MySQL sources and compiling the software. He
also mentioned the use of MySQL sandbox, which is a testing playground
for MySQL releases up to 6.0.

Colin went through some test cases and discussed MySQL storage engines.
Of particular interest was the open source MySQL proxy which allows for
monitoring and analysis of MySQL queries. Apparently, plugins are
written in the Lua programming language (version 5.1).

Finally, he described ways in which external developers can contribute
to MySQL.

Personal thoughts: Honest talk on where MySQL can improve in terms of
community contributions.


--- PostgreSQL 8.3: A story of hundreds of patches

Josh Berkus with his copresenter Pavan Deolasse spoke on the upcoming
PostgreSQL 8.3 release. He started off by showing the PostgreSQL logo
and asked whether a modified Lord Ganesha logo would suit an Indic
PostgreSQL team (given the silence that greeted his suggestion, I am
thinking no).

Josh started off quickly on PostgreSQL 8.3 by noting that it is in beta
and is to be released on the 4th of January 2008, in particular the
reason for it being in beta for quite a while has simply been the many
new patches the PostgreSQL team has received (280 patches and features).
In that sense, Josh noted that (unlike the other open source database)
PostgreSQL is a community project. Ouch! :)

One new feature has been SQL/XML which actually has been in development
for some time (since 2002). Some time ago, Peter Eisentraut wrote a
prototypical XML export feature (to export a table to XML). Following up
on that, Pavel Stuehle wrote an SQL/XML syntax demo (the first standard
syntax example which was dependent on pl/Perl). In 2005, Nikolay
Samokhvalov wrote updatable XML views in the RDBMS.

In 2005, Google funded 700 students to work on open source projects.
PostgreSQL got a whole 10% of that loot, with 7 students sponsored to
work on open source (Nikolay was part of the 7). To build proper support
for SQL/XML, PostgreSQL went to look for standards in this area and
found an unpublished ANSI SQL standard dated 2006. So the development of
SQL/XML was guided to some extent by this standard and there was lots of
back and forth discussion between the standards developers and
PostgreSQL. The willingness of PostgreSQL to do it right and implement a
proper standard instead of re-inventing a proprietary in-operable
standard. So there were code modifications to properly support SQL/XML
with many patches and subsequent revisions and most importantly, before
SQL/XML could be properly released, there was a need for proper
documentation. Kudos on this policy, I must say, as PostgreSQL never
officially releases anything without proper documentation.

Josh then gave example of use of SQL/XML. He dumped a whole lot of
restaurant reviews in an XML format to PostgreSQL and used the inbuilt
PostgreSQL XML functions to mine the data. There is a way to create XML
data out of a table by using the xmlforest() function. In fact, entire
tables/queries can be exported to XML via the table_to_xml() function.
xPath can also be used to mine XML data.

Did I say this is fscking cool? :) Folk in the audience seemed to think
so as well.

Next up was HOT (heap only tuples). Josh claimed PostgreSQL to be the
fastest Open Source Database (OSDB) compared to MySQL, and certainly
more scalable. However, he noted that because of the MVCC model,
cleaning up older versions is a big performance hit (read: vacuuming).
At this point Josh went into some significant detail regarding the
nature of MVCC and why vacuuming is a big performance hit. The gist was
that vacuuming can be solved by HOT which Pavan helped develop along
with Simon Riggs, Heikki Linnakangas, Tom Lane and various other folk.
Essentially, HOT provides the ability to do microvacuums which pretty
much solve the performance hit problem (and HOT will be in 8.3!).

Josh also mentioned that an Indian team (CDE from IIT) came up with
SkyLine which is an extension to the SQL syntax. However, as it was not
part of the SQL standard, it was put into PgFoundry.

For future development, Josh noted that PostgreSQL is a mailing list
driven project. However, he admitted that release cycles are slightly
long and in 2008, they are looking at doing two month long cycles which
allows for feedback a lot sooner.

There were some questions from the audience. On specific performance
tuning on multiprocessors, Josh suggested increasing shared buffers.
Another member of the audience wante to know whether the SQL/XML
interface would allow for JSON dumps. The answer was in the negative,
although it was noted that somebody working on pl/JavaScript. On running
PostgreSQL on handheld devices, there was no plans unless handheld
devices get really powerful (a due nod to the excellent SQLite was made
at this point).

What about PostgreSQL in low memory environments? Apparently that’s
feasible with PostgreSQL working under 20mb of ram (I’ve personally
tested it on < 64MB environments - PostgreSQL works wunnerfully!).
Somebody wanted to know about Sun’s interest in PostgreSQL when Sun has
JavaDB? Well, Josh replied that JavaDB is for embedded and PostgreSQL is
for large systems. On Microsoft’s implementation of SQL/XML, Josh agreed
that Microsoft sucks balls (I’m *ahem* paraphrasing) as their
implementation is “completely non standard” (woah, who could have saw
this coming, eh?).

On another question, Josh replied that PostgreSQL 8.3 will be able to
execute any function in the xPath 1.0 standard. A low-hanging-fruit
question came in on how to troubleshoot slow queries in PostgreSQL; well
(I would say RTFM but Josh was polite-r) use “explain analyze” and use
system level tools to determine IO, memory or CPU utilization. As to why
so few hosting companies provide PostgreSQL hosting, some common reasons
were that ISP’s think their customers don’t need it and cPanel only
offers MySQL (boo!).

There were several other questions but the talk soon ended and we went
off to the PostgreSQL BOF.

Personal thoughts: A most excellent talk. My faith in PostgreSQL was
always strong but now it’s root-firm!


--- BOF Session: PostgreSQL

The Birds-of-a-Feather (BOF) session was fun. Josh ran the PostgreSQL
BOF and there were approximately 20 people present. He started off by
showing some benchmarks showing how much PostgreSQL owns MySQL in the
speed area. Josh was honest though in saying that Oracle skill kicks ass
and PostgreSQL has some distance to go in catching up.

Josh discussed a tool he was developing to help generate a useful
postgresql.conf to the machine’s architecture as the default
postgresql.conf is extremely conservation in its values. He then took us
through some common postgresql.conf settings.

Personal thoughts: Very very informative discussion.


--- Lighting Talk

Next up were the lightning talks. Danese Cooper was organizing it. The
lighting talks were great fun. And thanks to Aizatto’s tai-chi, I ended
up becoming Malaysia’s representative at the talk. With Rusty Russell,
Rasmus Lerdorf and like 100 other people in the room, it was a wee bit
scary.

I spoke on CouchDB and Asterisk and how saving AMI events in CouchDB is
a great fit of technology. There were some other excellent lightning
talks (a math teacher speaking about teaching math in this modern day
and age, Rusty speaking on the ANTI-THREAD library, Rasmus on his trip,
some chap on great places to eat in Bangalore and a whole lot more!).

Personal thoughts: All in all, great fun!

-- 
  May your signals all trap                     Ditesh Kumar
May your references be bounded                ditesh at gathani.org
      All memory aligned                http://ditesh.gathani.org/blog
    Floats to ints rounded              http://www.openmalaysiablog.com




More information about the ossig mailing list