it would be nice if everyone was expert at everything, but they cant be. it would be nice if they hired experts but money doesn’t grow on trees. we often insist on a degree of excellence we refuse to pay for
It's not about being an expert at everything or hiring more people. These aren't particularly hard problems, it's not difficult to find biologists who are incredibly adept at using python, R or C. It's about thinking about how science gets funded and how it gets implemented. I've written here before about the difference between "grant work" and "grunt work", and how too computer touching tends to get looked down upon at a certain level.
If you're deciding who gets a large-scale computational biology grant, and you're choosing between a senior researcher with 5000 publications with a broad scope, and a more junior researcher with 500 publications and a more compuationally focused scope, most committees choose the senior researcher. However, the senior researcher might not know anything about computers, or they may have been trained in the 70's or 80's where the problems of computing were fundamentally different.
So you get someone leading a multi-million dollar project who fundamentally knows nothing about the methods of that project. They don't know how to scope things, how to get past roadblocks, who to hire, etc.
What's your source on it not being difficult to find biologists who are adept at using python, R, or C? Most biologists operating in private industry or academia have many years of training in their fields and many have learned their computational tools as they've gone on, meaning they've never received proper training. It seems dubious to claim that there's this neverending source of well trained biologists who are also adept at programming.
I would say the number of biologists who actually understand programming is extremely small. I've been programming for fun for ~15 years, and I'm about to finish a PhD in chemical biology (e.g. I started programming in C far before I started learning biology).
You might occasionally run into someone who is passable - at best - with R or Python. But most of the code they might write is going to be extremely linear, and I doubt they understand software architecture or control flow at all.
I don't know any biologists who program for fun like me (currently writing a compiler in Rust).
To be fair, linear code is often totally sufficient for most types of data analysis. Biologists don't really need to understand design patterns or polymorphism, they just need to not make computational mistakes when transforming the data.
Absolutely. My point was more than you can't expect comp. biologists to actually be "good" programmers when compared to SWE or even web devs.
Most of the code I write to do biological data analysis is fairly linear. However, I also generally use a static type system and modularity to help ensure correctness.
I've perused a lot of code written by scientists, and they could certainly learn to use functions, descriptively name variables, use type systems and just aspire to write better code. I just saw a paper published in Science had to issue a revision because they found a bug in their analysis code after publication that changed most of their downstream analysis.
It does get rather problematic when you have large quantities of...stuff. You can't run linear stuff in parallel so now you're bound to whatever single CPU core you have lying around.
I'd say that getting some basic data science computing skills should be more important than the silly SPSS courses they hand out. Once you have at least baseline Jupyter (or Databricks) skills you suddenly have the possibility to do actual high performance work instead of grinding for gruntwork. But at that point the question becomes: do the people involved even want that.
I write 'one off' programs all the time. Most of what I write I throw away, and I program for a living. Those are usually fairly linear. Which is fine. If I am writing something that will be re-used in 6 different ways and on a 5 person team. That is when you get out the programming methodologies. It is usually fairly obvious to me when you need to do it. For someone who does not do it all the time. They may not know 'hey stop you have a code mess'.
It one of the reasons why people end up with spreadsheets. Most of their data is giant tables of data. Excel does very well at that. It has a built in programming language that is not great but not totally terrible either. Sometimes all you need is a graph of a particular type. Paste the data in, highlight what you want, use the built in graph tools. No real coding needed. It is also a tool that is easy to mismanage if you do not know the quirks of its math.
It doesn't take being an expert at Excel to understand how Excel autoformats. It takes a few days of actually working with data or an introductory class that's today taught in American primary schools.