Skip to main content
Ctrl+K

Philip May

  • Machine Learning
  • Python
  • IT
  • Linux
  • Blog
  • About Me
  • Machine Learning
  • Python
  • IT
  • Linux
  • Blog
  • About Me

Section Navigation

  • Accelerate
  • CUDA
  • Dimensionality Reduction
  • Dense Passage Retrieval (DPR)
  • Experiment Documentation
  • German Electra Training
  • Graph Neural Network
  • Graph Database
  • LightGBM
  • LLM
  • Machine Learning at AWS
  • NLP Datasets
  • Optuna
  • Paraphrase Mining
  • Prompt Engineering
  • PyTorch
  • Seldon
  • Machine Learning
  • LLM

LLM#

Base Knowledge#

  • HF: The Alignment Handbook

  • HF: The Large Language Model Training Handbook

Specific Techniques#

  • Direct Preference Optimization (DPO)

    • Paper: https://arxiv.org/abs/2305.18290

    • https://plainenglish.io/community/direct-preference-optimization-dpo-a-simplified-approach-to-fine-tuning-large-language-models

    • https://huggingface.co/blog/dpo-trl

  • Mixture of Experts (MoE)

    • HF Blog: Mixture of Experts Explained

Specific Models#

  • Mixtral

    • https://mistral.ai/news/mixtral-of-experts/

    • HF Blog: https://huggingface.co/blog/mixtral

  • Argilla Notux

    • based on Mixtral

    • HF Model: https://huggingface.co/argilla/notux-8x7b-v1

    • Dataset: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned

    • Code: argilla-io/notus

On this page
  • Base Knowledge
  • Specific Techniques
  • Specific Models
Edit on GitHub

© Copyright 2020-2025 Philip May.