Thanks for visiting! My name is Chanin Nantasenamat, Ph.D. and in my daytime job I’m an Associate Professor of Bioinformatics and in my free time I am a Content Creator running the Data Professor YouTube channel.
Below is a listing of all articles that I have written that is conveniently categorized for your selection. Please drop a comment to suggest some future topics!
I get asked quite often on my YouTube channel (Data Professor) the following questions about how to break into data science:
So I thought that it would probably be a great idea to write an article about it. And so, here it is. It should be noted that the 10 things that I wish I knew about learning data science is based on my personal journey as a self-taught data scientist. …
Scikit-learn is one of many scikits (i.e. short form for SciPy Toolkits) that specializes on machine learning. A scikit represents a package that is too specialized to be included in
SciPy and are thus packaged as one of many scikits. Another popular scikit is the
scikit-image (i.e. collection of algorithms for image processing).
Scikit-learn is by far one of the pillars for machine learning in Python as it allows you to build machine learning models as well as providing utility functions for data preparation, post-model analysis and evaluation.
In this article, we will be exploring the essential bare minimal knowledge…
In this post, I combined what I did for Days 27–33 into a single post as I’ve been away for a week from posting contents on Medium. Hope you like the concised nature of this post. I’ve also added links to resources or YouTube videos that I watched during this period, just in case you’re interested in checking them out.
multiprocessinglibrary in Python for handling large calculation such as the molecular fingerprint problem mentioned earlier.
On Day 26 of the 66 Days of Data, I continued with coding an implementation in Python for calculating molecular fingerprints for a big chemical library.
This corresponded to 30,000 compounds * 2,000,000 compounds = 60,000,000,000 compound pairs. The former and latter sets represent the 2 compound library that I will use for this coding project.
The concept is simple actually.
For any of the 60,000,000,000 compound pairs, compute the Tanimoto coefficient which is a relative measure of the molecular likeness of 2 molecules where a value of 1 indicates that the 2 query compounds are the same…
On Day 25 of the 66 Days of Dat, I’ve pondered some more about
scikit-learn for an upcoming blog that I’m working on.
Here’s a preview of the first illustration I’ve made on the data representation of tabular datasets used for building models in
On Day 24, I’ve continued to work some more on writing a full blog post about
scikit-learn for data science. Aside from statistics, probably 80% or more of any data problem that you can think of can be handled by machine learning.
Motivated to distill the fundamentals of the scikit-learn library that is as beginner friendly as possible, I’ve set out to write a full blog post about it. As with other
How to Master …. …
On Day 23 of the 66 Days of Data, I started the day off by listening to 12 student presentations on their Mini-Project data analysis of various Kaggle healthcare datasets. And ended the day by being in the live chat of the Premiere video podcast with Nate at StrataScratch.
This is indeed an exciting day, where students are presenting the fruits of their hard work where they have coded a data analytics workflow for a Kaggle healthcare dataset. This is an amazing feat considering that they had no prior knowledge of Python about 3 weeks ago.
Students are given 10-15…
On Day 22 of the 66 Days of Data, I’ve spent time doing a Q and A session for the course I’m teaching as well as coded in Python for analyzing a large chemical dataset.
As the course comes to a close, the day was spent to provide students the opportunity to ask anything that they may have about the course. Most questions pertained to the Mini-Project assignment. In addition to answering questions, I’ve also provided a high-level overview summarizing the big concepts of the course as well as the high-level account of the data analytics workflow that students can…
On Day 21 of the 66 Days of Data, I started off the day by teaching the hands-on tutorial on
scikit-learn to an undergraduate class of Medical Technology students via Zoom. This introductory Python for Health Data Science course is compressed to only 3 weeks from the typical 16 weeks semester and as also mentioned in a prior blog post, it is amazing how students are able to
This also marked almost the last day of class in the sense that there will be no more lectures. …