Starting starting

3 minute read

It’s announcement time! And no, the announcement is not that I accidentally wrote the same word twice. It’s that I’ll be kicking of a few series of posts called “Starting…”, wherein I take a beginner from starting to hopefully quite advanced in a single topic. The idea behind this is mostly that I can hopefully help people that are just getting in to this area, but also that I’ll get used to teaching and talking about them. A couple of them are also just areas where I’ve wanted to get into them for a long time and I’m taking this as my chance to do so - so we’ll be learning together on those ones!

# The outline So far I’m think about a few different categories for this. Firstly there are specific libraries that I either have a lot of knowledge of that was hard-won through many projects, mistakes I wish I hadn’t made and hours going through GitHub issues (sidebar - I think very early on I found a lot of my answers on stackoverflow and over time my needs have gotten progressively more niche and I’ve migrated more towards specific user groups and GitHub issues to try and find the details of what could be going on. I think that’s progress? I’ve only noticed it writing this now, so I’m going to claim it as genuine growth).

I think for this, and here I’m thinking about the classic Python libraries like matplotlib, seaborn, scikit-learn, pandas, numpy etc., it will probably happen in bits and pieces as I think of specific stuff I want to cover, but I’ll really try to build it from the ground up. The rest of the time there are some specific skills I don’t have that are really quite important. A few big ones that are really on a lot of job requirements in industry, but no-one really cares about in academia, include things like SQL, tableau, power BI. I know that sounds weird, but yes I rarely come across SQL in an academic setting. It’s also even weirder that I haven’t explicitly spent any time practising using it because I’ve effectively learned all the basics when learning background for other database-like work in R and Python, like using joins.

After that there’s the more conceptual stuff. This will probably more things about machine learning, statistics and different methods and training techniques, going through some of the theory and papers. Again, most of this will be stuff I know well and want to communicate, and some will be things I want to learn myself. My first example I’m planning is a good example of this - distributions. There are quite a few I use all the time either through models or for sampling in simulations are hyperparameter search. But there are also a few famous ones I’ve never really come across and I’d like to know more about them.

Finally there’s the practical side of things weird there’s a technique that can be applied, and I’m not so tied to a specific library. This might be stuff like tuning hyperparameters, for example, as it’s fundamentally a practical exercise and can be done in any language.

Other posts

The key difference between the Starting posts and others is that there will be a level of organisation to them and ordering that isn’t present in other posts. So I’ll still be doing lots of posts on little things that catch my attention, and I might even retrospectively go back and file them in with a starting series. But most of the time I will just be putting out standalone posts alongside series to mix things up.

Ordering

The first of each starting series will be called “Starting …”, but all the subsequent posts won’t have a specific name (unless maybe I should put brackets and then starting X #2 or whatever at the end of the title? Hmmm now I’m not sure.). What I know I will do is go back and put links with the ordering to them so the original starting post is the contents page for that series that I can refer back to when needed.

Anyway, enough talking - time to start!