Using public data as well as slightly unconventional (but widely quoted) data points such as a schools affiliation and popularity, can clustering offer an alternative comparison method when selecting your childs primary school?

Continue reading - 15 min read

Coffee quality-related data is explored and cleaned. Using the tidymodels framework, I demonstrate feature engineering, fitting of three models (LASSO, Random Forest and XGBoost), tuning of hyperparameters and analyzing out-of-sample performance.

Continue reading - 16 min read

You might have heard that Springer Nature, an American German academic publishing company, is giving free access to more than 500 key textbooks across Springer Nature’s eBook subject collections, as their way to support lecturers, teachers and students and grant remote access to essential educational resources during this Covid-19 lockdown period. A repository of the books can be found here. To my delight, I stumbled across the springerQuarantineBooksR package made by Renan Xavier Cortes in a blog post.

Continue reading

Author's picture

Desmond Choy

Data Science | Machine Learning | NLP

Data Scientist
