Posted in 2024
09 August 2024 - The selection of topic-specific texts from Wikipedia
02 July 2024 - Pandas Data Format and Compression
11 April 2024 - The importance of chat templates
Posted in 2023
18 November 2023 - Pandas apply
Posted in 2022
12 October 2022 - Options for Date Encoding
23 February 2022 - Anomalies in the MLSUM Dataset
22 February 2022 - Clean German Wikipedia Text Corpus released
20 February 2022 - LightGBM with Optuna: Demo released
Posted in 2021
10 April 2021 - German colossal, cleaned Common Crawl corpus (GC4) released
Posted in 2020
01 December 2020 - Training and Evaluation of our German Electra Language Model Talk