— eridanus dot net

Python, bioinformatics and ‘open blogging’

So a few days ago I was checking my Google Alerts, which come through daily detailing blogs, news releases etc. that are in some way bioinformatics related. Quite often this details posts from friends who are ‘active’ bioinformatics bloggers – Ally, Frank and Mike in particular. There are a few ‘usual suspects’ in bioinformatics blogging and one of the things that comes around regularly is a blog called Beginning Python for Bioinformatics.

This site takes the basis of Beginning Perl for Bioinformatics and attempts to follow its flow but for Python, a language which along with Java has gained great traction in the bioinformatics community in recent years. For me the site has been interesting because I don’t know Python, I have zero interest in learning it, but it’s always nice to see how tasks are familiar to you in one language (in my case Perl) are implemented in another.

Recently the site has drifted away from the book, and I was compelled to comment on this article here. This caused, I think, some annoyance to the author who felt the need not only to rebut me in the comments, but in 2 subsequent posts. Frustratingly when I replied to one of the posts (this one that is directly aimed at me) he has not published it.

One of the things that marks the bioinformatics blogging community is a push towards ‘open everything’ – I think this is noble, occasionally naive, but something to strive for. Open data, open exchange standards and open publication, so to have something as trivial as a comment on a blog ‘censored’ is actually a real shame.

I don’t really like to mix up work on my blog, it’s just meant to be a place for my meanderings, not really focused on what I do, but I feel that if I don’t publish this here, than my effort on writing the unpublished reply is just wasted.

My initial point to the author was that writing code to split FASTA files was redundant – there are utilities to do this for you, they will do it faster, more efficiently and more flexibly. This is my reply to the blog author in full:

I admit that some of the things you have addressed in the past, whilst ‘reinventing’ the wheel, offer important insights into the general problems of programming in general – this particular example was not one of them. From following the course which I assume was outlined by ‘Beginning Perl for Bioinformatics’ (which I do not think is a good reference for beginning Perl at all) a user should already be able to perform the following operations – open a file, iterate over and parse the contents, open a file for writing, and apply some rules for formatting the output. Thus you have already given them the tools to complete the operation should they wish to do so in a Python context.

The problem is that you are trying to educate potentially early career/transitional bioinformaticians here – and showing them how to complete tasks in X lines of Python when there is a single command line application to do the trick I think is disingenuous. We wouldn’t dream, for instance, of educating our MRes students in this way.

I also would take issue that diversity is a key. You can be as diverse as you like but no-one is seriously going to suggest that you write your next script to parse something out of a GenBank file in assembler. One of the hallmarks of any computer professional in any field is the ability to use the right tool for the job. You should be technology agnostic and be prepared to string together applications or scripts from anywhere to reach your goal – take a look at something like Taverna for instance.

I notice in the other post that you mention there is a high proportion of Windows users to this site and therefore command line *nix comments are of no interest. Although I know a number of bioinformaticians who use Windows (generally to integrate with an institutions existing Microsoft heavy infrastructure) they all have logins to dozens of *nix machines, run *nix in a VM or at the very least have a Cygwin installation with which to work with. Basing the availability of technology on the OS of your page imprints is not necessarily indicative.

I hope you don’t mind me addressing these points here rather in my blog, as my blog is personal (hence the lack of bioinformatics content on there). I have actually been following these tutorials for a while (as someone who came into Bioinformatics some time ago – my first forays in programming were in Perl) it is nice for me to see tasks that I find intuitive in one language written in another. Consequently the fact that I felt need to comment, merely suggests that I felt strongly on this issue, rather than having any negative feelings towards what you are trying to achieve.

I personally don’t see what was so offensive about this to warrant its non-publication.

I would like to explain one of the comments above as well. I don’t think that ‘X for bioinformatics’ approaches are necessarily a good way to teach people programming. Beginning Perl for Bioinformatics was an atrocious book in my opinion, especially when you hold it up against the excellent Learning Perl (it was redeemed by its subsequent sequel a little but no-one is going to put Perl learning in context better than Randal L. Schwartz). What you end up doing with ‘Beginning X for Y’ is providing cookbook style recipes for people who will get a habit of ‘cut and paste’ programming, rather than thinking about what they need to achieve, how they are going to achieve it, and enjoying the challenge of learning something new in order to do so.

Whilst bioinformatics is deemed ‘specialist’, the stuff you end up doing in Perl or Python is generally still just processing, munging and conversion operations. The basics of programming are probably best taught independently of the context in which they will be used, and the field specific stuff is always best handled by the existing Bio* project (Bioperl, Biojava, BioPython).

That’s my 2c on that anyway ;) Comments welcome and will not be ignored!

6 comments
  1. [...] Daniel Swan accused me of not publishing his comment or reply to his comments. I have never seen his reply, either in the comments or as an email to me, [...]

  2. Daniel Swan says: 2007/10/1814:24

    I have no hesitation in leaving this comment in linking to one of Paulo’s increasingly incensed posts about what I have written. I would like to point out though for the record that I have a PhD (Mr Daniel Swan indeed ;)) and that maybe he is taking things just a little bit personally.. I thought I was being helpful and stimulating discussion on his blog, but apparently not!

  3. pnuin says: 2007/10/1815:42

    Dr Swan,

    Indeed you stimulated discussion in my blog, and I thank you for that. On the other hand, I won’t thank you on falsely accusing me of not publishing your insightful comments.

    I am not taking anything to the personal level, because 1) I don’t know you, 2) this is the internet where everyone is free to have his/hers opinion and 3) I am not always right nor wrong.

    I suggest you to follow your comments to the letter

    “showing them how to complete tasks in X lines of Python when there is a single command line application to do the trick I think is disingenuous. We wouldn’t dream, for instance, of educating our MRes students in this way. I also would take issue that diversity is a key.”

    and post a single line solution for everything I have posted and will post. We need more smart people, that can see further down the road than a normal guy like me. You are more than welcomed to use the comment space in my blog, or if you prefer I can set up a mirror for you so you can post your solutions. This will result in a lot of discussion and increase the diversity. Also you can properly show us how to educate the MRes out there.

    I understand if you don’t accept my offer, but I would really appreciate if you do. This is a simple task and might not take more than 20 minutes for a PhD like you. I can add a domain and wordpress installation for you in no time.

  4. Daniel Swan says: 2007/10/1818:04

    Hi Paulo,

    OK well I have been on the internet long enough to understand that things ‘disappear into the ether’ – I saw that comment submitted from my end, and I am sorry it didn’t appear at yours. Technology is fallible.

    I did think that maybe I had put your back up a little bit, but apparently not that is fine. I waited several days for the comment I submitted to show up, and I saw other comments approved, and drew my own conclusion and if that is erroneous then I am sorry.

    Thanks for the offer of contributing to your blog (which I will continue to read, as I have for a long time!), I simply do not have the time or the inclination to enter the bioinformatics blogging realm in any capacity – or I would already have done so. I run a bioinformatics ‘core facility’ at my institution and my spare time and work time already overlap too much! My teaching, though occasional, I generally deliver in lectures, or as a supervisor to undergraduate or postgraduate students. As you have taken on the role of an educator in your blog, I just wanted to make your readership aware of the fact TIMTOWTDI – which you did in subsequent posts. If I feel I have future contributions to make in your blog, which I would reiterate I do think is a valuable resource, then I will.

    regards,

    Dan

  5. knirirr says: 2007/10/1910:25

    You missed out BioRuby (http://bioruby.org/). You would, of course, expect me to remind you of this. ;-)
    I’m not using it at the moment but that is because I am doing atmospheric physics rather than bioinformatics.

  6. Daniel Swan says: 2007/10/1911:53

    Milo,

    I also missed out the excellent BioConductor which I am using at this very moment. Although I’m currently lacking a decent annotation for the Affymetrix Human Gene 1.0 ST array right now which is slowly cheesing me off..

Submit comment

You must be logged in to post a comment.