No. Well there you go, that was easy! See you next week.
Okay, in all seriousness, coding isn’t for everyone, but apart from breathing, is anything really for everyone? So this week’s article isn’t about why everyone should learn to code, but rather some of the benefits and difficulties about learning to code, to enable you to make your own decisions. Sound good? Then read on!
Why should you bother learning to code?
Coding can offer many advantages for bioscientists, such as:
Enhancing your research productivity and insights. Coding can help you automate repetitive tasks, such as data cleaning, formatting, and processing. It can also enable you to explore new questions and hypotheses, by allowing you to manipulate and analyse large and complex datasets, such as those generated by RNA-sequencing or proteomics. A great example of this is the Visualisation Tool made by Vlad Ungurenu, which allows visualisation and exploring of transcriptomic datasets (see below for an expanded explanation).
Improving your data processing and reproducibility. Coding can help you document your data analysis steps; it can also facilitate the replication and verification of your findings, by making your code available and accessible to other researchers. This is the main advantage of processing data in a programmatic way (e.g in RStudio) rather than using something like Excel. If you write a script to process the data, then by re-using that script, you can ensure that your data is processed in the exact same way each time. This is also true for your future self! If you come back to analyse some new data in the same way, it is much easier if you have an annotated script that you can work through step-by-step.
Enhancing the tools available to you. Many bioinformatic processes (such as “cleaning” and analysing an RNA-sequencing dataset) are done programmatically, with virtually no other options available. If you don’t understand the coding necessary to run these analyses (e.g. by using the DESeq2 package in RStudio) then you will need someone else to run the analysis for you. Even if you do have access to a helpful bioinformatician, there are some decisions that need to be made during the analysis (e.g. how to filter data, what cutoff to use for significant genes), and if you can’t run the analysis yourself then you’re dependent on someone else’s decisions.
Expanding your career opportunities. Coding can help you demonstrate your technical skills and competencies to potential employers and collaborators. Coding itself is a highly transferable skill that can be applied to many areas outside of bioscience, such as finance and web design. The expanding field of data science is another option.
What are some of the difficulties of learning to code?
Coding can however pose some difficulties, such as:
Learning the basics of programming. When you’re first starting out coding, it can be very unintuitive and confusing. It’s often difficult to see how the basics you learn connect to a given task that you want to complete. It also involves learning the conventions of a programming language and the tools and environments to run and debug your code. Programming involves a lot of problem-solving. But when you’re starting out, this can feel like trying to solve a puzzle where you don’t actually know all of the pieces. I’ve had many times where I’m googling to try and solve a coding problem, only to read an answer and think “I didn’t know I could write a piece of code to do that”.
Finding the right resources and support. Coding can be overwhelming and frustrating at times, especially when trying to fix bugs in your code. Oftentimes you get an extremely long error filled with jargon, giving you no information as to what the actual problem is! It can be hard to find reliable and relevant sources of information and guidance, such as tutorials, books, online courses, or mentors. I’ll link to a few useful resources below! As a rule of thumb though, a good place to look when trying to solve errors is Stack Overflow, a forum for developers.
It can take a long time to learn enough for it to be useful. Coding is one of those things where it can be surprisingly difficult to code for a task that feels relatively simple. Even learning how to make a series of graphs can be tough when you have no coding background. When deciding whether to learn to do something programmatically vs manually (e.g. make a volcano plot), a consideration I make is how much time do I have to invest into learning, vs how much time it takes me to make it manually. And thus by investing the time now in learning how to do something programmatically, how much time can I save myself going forwards?
Most useful coding languages for bioscientists
There is no definitive answer to the question of which coding language is best for bioscientists. The choice depends on various factors, such as your research goals, preferences, knowledge background, and resources available to you (i.e. whether you can install programs on your computer). However, some of the most popular and widely used coding languages in biosciences are:
Python: Python is a general-purpose programming language that has a relatively easy learning curve. Because python is so ubiquitous, there are lots of great resources to teach you how to use it, and your knowledge of python will be widely applicable to other applications and programming languages. Furthermore, many libraries useful for biosciences are written in python, such as NumPy, gseapy and pandas.
R: R is a programming language that is powerful for statistical computing and data visualisation. It’s widely used in bioinformatics and genomics, and has many packages for biological data analysis and visualisation, such as Bioconductor, ggplot2, and DESeq2. R (and its Interactive Development Enviroment, RStudio) are designed to be more intuitive and more approachable for those without programming experience. In short, R is the programming language, and RStudio is the interface you use to actually run it.
Javascript: Javascript is a programming language that is useful for web development and interactive data presentation. Javascript is another language that is fairly ubiquitous, and so widely translatable beyond just bioscience.
Resources to learn to code
Kaggle
Kaggle doesn’t have any video tutorials, but rather teaches you through a variety of interactive tutorials followed by practice exercises. Personally, I think coding is one of those things that you need to actively practice to get better at and improve your understanding. Kaggle is great at talking you through the solutions to its exercises, and you aren’t just limited to one solution - anything that works!
The website is built around teaching you core Python and programming skills, including modules on Pandas (the Python Library, not the animal!), SQL, and data visualisation.
Datacamp
I found DataCamp to be a great accompanyment to Kaggle. While Kaggle is completely free, DataCamp has some paid-for-features (e.g. ceritifcates to show you’ve completed given courses), but they still have many great free courses, such as Introduction to Python. They also have an intro to Data Science course to explain the role(s) of a data scientist and what you can use coding for, without actually going into coding detail.
The key difference is that DataCamp has videos to accompany their exercises, which can help to make explanations easier a bit easier sometimes. The site also has various “tracks”, which combines together various related courses that complement each other.
Biostats Squid
BiostatsSquid is a site (and Youtube Channel) geared specially towards bioinformatics, i.e. an application of the coding that you learn. They have tutorials explaining both the theory behind and how to actually perform various analysss (such as Gene Set Enrichment Analysis). In my experience, the tutorials are clear and easy to follow, and have been really useful to me personally!
That’s all for today! If you have any suggestions for useful websites and resources for learning to code, then put them in the comments below! Otherwise, have a great weekend and thanks for reading.