Tuesday, January 27, 2009

MetWare presenation at Metabolomics Workshop

I gave this 10 minute presentation on MetWare this afternoon at the International Metabolomics Workshop:
Metware

I features PRI/Wageningen slide styles as this work was done in my previous post-doc in Wageningen.

Friday, January 23, 2009

Statistics on the Development Community

Git is nice. Nicer to some than to other, that is true. GitReady just learned me how to calculate commit message in a few seconds: git shortlog -s -n. For the whole Bioclipse and CDK commit history. Seconds. Here the are.

CDK
  6337  egonw                                                            1446  rajarshi                                                          616  shk3                                                              540  steinbeck                                                         421  miguelrojasch                                                     139  chhoppe                                                           115  eoc21                                                              98  kaihartmann                                                        97  mfe4    74  labarta    69  tohel    43  f_marighetti    40  ospjuth    39  nielsout    34  sushil_ronghe    29  archvile18    29  maet    26  djiao    24  michaelthoward    22  dirk49    17  Egon Willighagen    16  martitm    15  jhao    14  stomkinson    12  benedikta    12  dleidert    12  telto    11  thomaskuhn    10  sea36    10  yz237     7  zzzgggrrr     6  speleo3     5  jharter     5  mario_baseda     4  akrassavine     4  gilleain     4  mekovich     4  ulif     4  yeldar     3  edgarl     3  jonalv     3  petermr     2  drzz     2  jakt     2  sithmein     2  vedina     1  Kaihartmann
Bioclipse
   939  ospjuth   466  jonalv   438  egonw   293  shk3   266  goglepox   263  carl_masak    93  edrin_t    67  rklancer    44  biocoder    23  gilleain    23  miguelrojasch    21  Annzi     9  Egon Willighagen     1  grantsparks
There indeed is a bit of contributor duplication, but pretty neat. I did not have a full Jmol git around, so those statistics will have to follow.

Wednesday, January 21, 2009

Details behind the "Calling XMPP cloud services from Taverna2"

On Monday I showed two screenshot showing our new XMPP-based web/cloud services in action inside Taverna.

I promised details, but realize I have actually already posted a lot of them in October
Johannes ideas led to the IO-DATA proposal (XEP-0244), which is currently marked experimental and being discussed on the ws-xmpp mailing list. He gathered a few people around him to get it going, resulting in working stuff! Yeah!
Joerg asked Could you post more results, what is it, why do we need it, e.g. why are you mentioning SOAP and cloud? Do not know enough to see the bonus right now.

What is it? IO-DATA is a protocol on top of the XMPP protocol to allow machine-to-machine communication. Actually, much like SOAP, RPC, and other platforms. How IO-DATA differs lies to some extend to the transport layer: instead of using HTTP, it used the XMPP transport protocol, also used for Jabber chat clients. It basically allows clients like Taverna to chat with services running elsewhere.

Why do we need it? Most services run over HTTP, making them web services. This is convenient, because there is much infrastructure around, like web browsers. REST services also take advantage of this. However, for heavy computing this sometimes leads to problems. For example, routers are known to have time outs on HTTP connections. To solve this, SOAP services often introduce a polling mechanism. IO-DATA takes a different approach. Instead of having to ask all the time how a calculation is doing, you can just wait for the service to send you a message when it is done. Instead of working around the lack of asynchronous aspects, IO-DATA introduces these in the protocol.

Other interesting features include that the IO-DATA integrates the interface formats for services into the service itself, SOAP needs WSDL for this, and that it features service discovery via DISCO. The latter is done with SOAP too, for example with UDDI and BioMoby. The latter also adds strong data typing for input and output of services.

IO-DATA addresses the data typing by allowing asking the service what XML Schema it uses for input and output. While XML Schema has alternative, and which may be prefered in some situations, it does allow strong data typing and supports a lot of formats in life sciences (which I'll summarise soon).

Moreover, if there just happens not to be a suitable schemata around, you can just define one yourself, which can be as simple as a single element wrapper around some custom text-based format. You worry about supporting many formats? Well, no need. Johannes' xws4j library, which I used for the Taverna plugin too, allows compiling a Java binding code. Bioclipse's script environment allows you do to this on the fly: you find a service, ask for the schema, compile bindings for input and output, set up the input with the input binding, send it of to the service, and use the output binding for convenient access to the computation results. Without having to reboot Bioclipse. Isn't that cool? Can your software do that? (See this example Gist: the io factory creates the binding).

Why do I mention SOAP and the cloud? It should be clear from the above why I mention SOAP: it offer the same functionality, but more conveniently, we think. I mention cloud here, to refer to cloud computing which is doing computation on the cloud, which is a synonym for the internet (see Cloud Computing @ Wikipedia). Because it does not use HTTP, we do not feel we can call it web service. Instead, cloud computing is a more general term, not tied to any particular architecture. IO-DATA is just one possible architecture, one we think is promising for life science applications.

Monday, January 19, 2009

Calling XMPP cloud services from Taverna2

SMILES (CCC) in, mass out. Yes, we can now call XMPP/IO-DATA cloud services with Taverna2 :)

Details will follow, but here's the source code.

RSC now allows Jmol in main text of publication... well, almost

Rich Kidd wrote in the ChemistryWorldBlog about Henry Rzepa to have published two papers in RSC journals where Jmol is part of the main paper, after having used Jmol in extra material in ACS journals before. The key here is that the Jmol is part of the official text... when you open the paper in a browser, you immediately get to see the Jmol live, 3D graphics! Well, so it is said in the blog.

However, when I checked the HTML of the first of the two papers (A computational investigation of the structure of polythiocyanogen, doi:10.1039/b810147g). The main HTML still links to a supplementary page. Progress, but not perfect either:

Friday, January 16, 2009

Bioclipse and Gist integration

As you might have read, Bioclipse has scripting support (see for example, Scripting JChemPaint), and that we have been collection them on Gist and indexing them on Delicious with the tags bioclipse and gist. This provides a nice overview of what you can do with the current SVN version of Bioclipse2. And, hopefully, when released, allow users to quickly learn about Bioclipse features, allow people to share scripts etc. Think of it as MyExperiment.org for Bioclipse.

Now, what was missing until today, was easy access to gists in Bioclipse itself. No gist.load(33421) yet. There still is not, but I uploaded earlier today a Wizard for it. (The manager will follow later). Right click on an open Project, select New -> Other, and pick Download Gist:

and click Next:

Then, just type the number of the Gist you want to open in Bioclipse, for example 18315 (see Bioclipse2 Scripting #1: from SMILES to a UFF optimized structure in Jmol), and click another Next to select a file name and location:

The current code does require you to know the Gist number, so you'll need a web browser to look it up, but we do have search facilities in mind. Also, while the code attempts so, the resulting Gist is not automatically openend in an editor (a bug). Another idea is to just install the egit plugin in Bioclipse :)

Thursday, January 15, 2009

Editing and Validation of PubChem XML documents

With the general framework set up for editing and validation of CML documents, it was fairly easy to support the PubChem XML file format schema too.

With the upcoming Bioclipse2 beta (scheduled next Friday), all you need to install on top of the Bioclipse2 core is the new XML feature.