Skip to main content
Ctrl+K

Philip May

  • Machine Learning
  • Python
  • IT
  • Linux
  • Blog
  • About Me
  • Machine Learning
  • Python
  • IT
  • Linux
  • Blog
  • About Me

Recent Posts

  • 09 August - The selection of topic-specific texts from Wikipedia
  • 02 July - Pandas Data Format and Compression
  • 11 April - The importance of chat templates
  • 18 November - Pandas apply
  • 12 October - Options for Date Encoding

Archives

  • 2024 (3)
  • 2023 (1)
  • 2022 (5)
  • 2021 (1)
  • 2020 (1)
  • Posted in 2021

Posted in 2021

German colossal, cleaned Common Crawl corpus (GC4) released

  • 10 April 2021
  • Philip May

Philipp Reißel (ambeRoad) and me published the largest German text corpus within the German NLP Group: The German colossal, cleaned Common Crawl corpus

Read more ...


© Copyright 2020-2024 Philip May.