Thoughts on Oracle vs. Google »
Charles Nutter, an open-source Java developer and former Sun employer, looks at the history, patents, and likely outcomes of Oracle’s suit against Google’s Android.
Rich programmer food »
Steve Yegge, on why programmers should write compilers:
Whenever I gave even a moment’s thought to whether I needed to learn compilers, I’d think: I would need to know how compilers work in one of two scenarios. The first scenario is that I go work at Microsoft and somehow wind up in the Visual C++ group. Then I’d need to know how compilers work. The second scenario is that the urge suddenly comes upon me to grow a long beard and stop showering and make a pilgrimage to MIT where I beg Richard Stallman to let me live in a cot in some hallway and work on GCC with him like some sort of Jesuit vagabond.
The Perl 6 project is ten years old »
Carl Masak looks back over the past ten years of Perl 6’s development, from its catalyst and initial design, to budding implementations.
Nuke’em ‘Till They Glow »
Steve Blank talks about his job as an electrical engineer in a nuclear reactor in the 50s.
Steve’s mention of seeing Cerenkov radiation for the first time reminded me of a school excursion to the reactor at ANSTO, and the strange, at once both attractive and repulsive, thought that I had to dive into the fuel rod storage pool. Our guide at the time figured I’d be OK, as long as I didn’t go more than 3 meters deep, or stay in too long. In the end, I erred to stay on the dry side.
jsoup 0.3.1 released »
I’ve just released version 0.3.1 of jsoup, the Java library for working with real-world HTML.
This version adds bulk HTML methods to the Elements collection, supports easy form validation of HTML user input, improves bulk attribute matching, and includes fixes for some minor bugs.
A hearty thanks to everyone that has tried jsoup and written in to me or to the mailing list with their experiences. Your input is directly shaping jsoup for the better.
Federal Court rules Sensis has no copyright on directories
The Federal Court of Australia has ruled that Sensis holds no copyright to its White Pages and Yellow Pages directories (or, at least, the ones Sensis tendered in evidence to prove that they do).
Catchwords: INTELLECTUAL PROPERTY – whether copyright subsists in White Pages and Yellow Pages directories as original literary works – central concepts under the Copyright Act 1968 (Cth) – centrality of authorship – whether the contributions to the directories involved the necessary independent intellectual effort or sufficient effort of a literary nature
340: None of the Works were original. None of the people said to be authors of the Works exercised “independent intellectual effort” or “sufficient effort of a literary nature” in creating the Works. Further, if necessary, the creation of the Works did not involve some “creative spark” or the exercise of the requisite “skill and judgment”.
347: For those reasons, I do not consider that copyright subsists in any of the WPDs listed in Annexure A or any of the YPDs listed in Annexure B. I will direct the parties to bring in a proposed minute of orders to give effect to these reasons for decision by 4:00pm on 12 February 2010.
This finding raises some large questions around how companies can protect their own works that are created by a diverse workforce that this interpretation of copyright law does not consider to be “authors”: the programers, business analysts, managers, customer representatives, and the company as a whole that collectively creates large works.
The answer may be to sit tight and wait for parliament to introduce legislation similar to the EU’s 1996 Legal Protection of Databases Directive:
30: As the High Court observes, there is no counterpart in Australian law. It is not open to me to ignore the express words of the Copyright Act to expand protection consistent with that set out in the Directive as summarised by the High Court. That is a matter for Parliament and, in my view, a matter which they should address without delay.
A rant about PHP compilers in general and HipHop in particular »
I’ve heard the argument “you don’t need a compiler, since PHP is rarely the bottleneck” for many years. I think its complete bollox. But I wrote a compiler for PHP, so I would say that.
Unless your PHP server is sitting there idling (which is probably the case for many PHP servers out there), then you could make use of a PHP compiler. For small timers, all components of your application are going to be sitting on the same box, contending for the same resources. Even if you assume the DB is the bottleneck, the resources the interpreter consumes could be more profitably spent on the DB.
New version of jsoup released »
I’ve just released version 0.2.2 of jsoup. This release adds some new class name and HTML manipulation methods, improved document normalisation, and nicer HTML pretty-printing.
jsoup is now also available on the Maven central repository, so getting started is easier. See the details on the download page.
jsoup HTML parser launches

Today, I am announcing the public beta launch of jsoup, an open source Java HTML parser that I have been working on recently.
jsoup is a Java library for working with real-world HTML:
- parse HTML from a URL, file, or string
- find and extract data, using DOM traversal or CSS selectors
- manipulate the HTML elements, attributes, and text
- clean user-submitted content against a safe white-list
jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.
jsoup is an open source project distributed under the liberal MIT license. Source code is available at GitHub.
As of this initial launch, jsoup is immediately useful, and it is in use in several internal projects. But of course it can be made more useful: so please, send me your suggestions and thoughts; either to the project’s mailing list, or to me directly.
If you would like to contribute code that would also be welcomed.
For more information, and to get started using jsoup, visit the project’s website.
API design matters »
Michi Henning writes about the cost of bad APIs, and how to design good interfaces:
A great way to get usable APIs is to let the customer (namely, the caller) write the function signature, and to give that signature to a programmer to implement. This step alone eliminates at least half of poor APIs: too often, the implementers of APIs never use their own creations, with disastrous consequences for usability. Moreover, an API is not about programming, data structures, or algorithms—an API is a user interface, just as much as a GUI. The user at the using end of the API is a programmer—that is, a human being. Even though we tend to think of APIs as machine interfaces, they are not: they are human-machine interfaces.