Posted in 2022
Options for Date Encoding
- 12 October 2022
Some data, such as strings, must be encoded to be used in machine learning models. Here we explore the different options for encoding date fields.
Python Installation and Package Management with conda and pip
- 23 July 2022
This article is about installing Python and package management. It is a subjective article and represents my own opinion and experience. The article is structured by several recommendations.
Anomalies in the MLSUM Dataset
- 23 February 2022
While evaluating the ml6team/mt5-small-german-finetune-mlsum summarization model, my colleague Michal Harakal and I noticed that in many cases this model for summarization simply reproduces the first sentence of the input text. Instead, it should generate an independent summary of the whole text.
Clean German Wikipedia Text Corpus released
- 22 February 2022
Today I published a new Wikipedia-based German text corpus. It is to be used for NLP machine learning tasks.
LightGBM with Optuna: Demo released
- 20 February 2022
This week I published a project to show how to combine LightGBM and Optuna efficiently to train good models. The purpose of this work is to be able to be reused as a template for new projects.