Just yesterday I decided to start seriously developing in Julia. High-level languages are a bottleneck for computational biology. We need to be able to write things fast, and have them run fast. So far no language really does this. But Julia looks like the one.
I'm going to put together a BioJulia team is anyone is interested in playing.
I am starting a Ph.D in evolutionary biology in May. Julia looks like a pragmatic solution to a lot of the voes in computation these days. If you require folks to do things, I would be happy to assist. I have extensive experience in C and python. I hack in 20 other languages too, but have not yet completed any serious projects in julia.
Python is very popular. I need to explore Pandas / Numpy more, but I was under the impression that they are closely linked to the underlying C arrays to provide high performance.
In my opinion the problem with computational biology is that most biologists are not keen to improve beyond a basic level of programming.
That is a problem for a large number of biologists who are working with bioinformatics, not for computational biologists, who tend to be computer scientists working in biology. I'll admit there are a good number of crap computational biologists as well, but that's not a reason to stop the rest of us from having good tools. In fact we should be trying to propagate tools that help them do what they are trying to do with minimal friction. Like Julia.
Numpy/pandas are good. But as evidenced by the Julia benchmarks, Numpy is relatively very slow. Also, most things that slow down computational bio are to do with much broader aspects of the language than linear algebra libs. Most successful standalone sequence analysis software is written in C or C++ for this reason.
I suppose I need to explain my position on the involvement of Julia in bioinformatics a little bit more than just a short "I want to play" statement. What follows is a bit of a rant that attempts to explain why things are bad and that we need to work to make them better.
I have been in the field of computational biology for (practically) 3 years. In this time, I have seen my fair share of bad tools and silly approaches to very basic problems. A lot of the computer science folks may not realize it, but there is a lot of trouble of basic software engineering sort in computational biology. There is a lot of old, unmaintaned code, messy projects implemented in multiple languages, and (of course) bugs. It does not appear that anybody checks or maintains their code after publication - the projects often die after they appear in a journal once.
There is a number of reasons that the situation is the way it is. One of the sadly obvious ones is that the academics do not have the time or desire to maintain their code. Some of the project would require full-time coders to be maintained - and that is indeed the case for some of the bigger and more popular tools. This results in the fact that some projects never take off or live up to their potential - for the simple lack of time. The wasted effort means that a lot of work is being re-done and science in general stagnates because of that. There is no easy solution to this problem (other than centralizing the efforts somehow - but that is the question of community, not tools).
The issues that can be made better are the following (and I will start with the most obvious ones first):
1) We need a language that is both easy to write in and is fast enough for production. Too many times there exist projects that are written in multiple languages. I have myself partaken in a few of those. The high level code is usually written in python or perl (I shiver of the thought), while the heavier numerical things are done in C or C++. This creates a rather large divide in terms of who does what - quite often folks only know a single high level language - so the numerical implementation stays opaque with only a single person knowing how it works. This means that projects of that sort quickly become unmaintainable. There is also a lot of glue code written - and God help you if you need to understand how perl-guts work. If there was a single sane implementation for both high level and numerical stuff, it will solve a lot of those problems.
2) We need a fast language. Building on the previous argument - the reason for splitting is quite often performance. This means that we use a single language for both layers - and we get the "Node.js" effect (I'm not sure who to reference this phrase to) - both front- and back-end stuff comes together. This also means that you are not penalized for using complicated data structures in your numeric code - so one level of separation falls away automatically.
3) We need a language with a large number of capabilities. Julia community is aiming to replicate a lot of the functionality of R. That means that it is already possible to use Julia for, say, an undergraduate statistics course. There is absolutely no reason (other than the historical, of course) why R is used by the statisticians. It was written by statisticians for statisticians - and has a lot of nice features. However, that means that a lot of the efficiency considerations have been missed. R does a whole lot of data copying - which is ridiculous for large data sets. For the growing crop of statisticians it will barely make a difference which language is used - I would go as far as to say that a lot of undergrads will not even notice the difference, but those who will, will thank us later.
4) We need a functional language. Much rather, we need a multi-paradigm language that has a strong functional basis. The advantages of the functional approach are too many to name here - and I am afraid this is already becoming incomrehensive. A lot of formal math and stats is really easily translatable into functional mindset - and that is a great boon if you are trying to implement an algorithm out of a math paper. Also, Julia does not restrict you to think in a particular way - it is very adatable to your thinking patterns.
These are just a few reasons that I would try give to support a case for a new language in the scientific community.
I'm going to put together a BioJulia team is anyone is interested in playing.