Machine Learning at Founders and Coders V1

A level 7 apprenticeship focused on building and scaling industry-standard end-to-end deep learning systems, from data engineering to model design.

This document outlines the machine learning programme at Founders and Coders. Broadly speaking, our curriculum covers building and scaling end-to-end AI systems, the Transformer architecture and fine-tuning.

Project - Search & Recommendation Engines

We have split our curriculum into five phases. The first three phases focus on a single project.

Recommendation engines are complex systems. They are a close cousin of the search engine and are found at the core of all Big Tech companies. Given their complexity and the many moving parts, they require several areas of expertise, ranging from content understanding (e.g. video, image, object detection, text understanding) to engagement pattern recognition (e.g. coordinated behaviour). During the course, we will build a recommendation engine to production standards.

Recommendation engines need users, items and engagement between them. For this reason, we have created a simulation called SimNet. It consists of 100k users, divided into consumers and producers. Both types will interact with the platform in similar ways, but producers will also create content in the form of text and images.

SimNet users and creators are dynamic, aiming to mimic real-world behaviour. Thus, the level and type of interaction are dependent on previous recommendations. Good recommendations bring more users and engagement and, in turn, convert more users into creators. On the other hand, bad recommendations will decrease overall engagement, yielding a lower score.

Fig. 1 - High level view of TinyWorld and teams servers

Fig. 1 - High level view of TinyWorld and teams servers

Phase 1 - MLOps & Data Engineering

The initial phase will focus on one theme: What does it take to build the infrastructure to handle 50 requests per second while keeping latency low?

Such a system generates roughly 4 million data points every 24 hours and requires 1.5GB of daily storage. To tackle this scale, teams will be equipped with a Hetzner dedicated instance with 64GB of RAM, NVIDIA RTX A6000 GPUs and 1TB of SSD storage. Then, it will be the team's job to configure the machine and set up the entire infrastructure.

Data must be automatically processed to derive key metrics such as user retention, item popularity, user-item engagement, experiment outcomes between control and test groups, and feature extraction. In addition, offline jobs will require pipeline dependency management and scheduling workflows. In other words, teams will need to build ETL workflows and data warehouse solutions from the ground up.

During this phase, multiple topics will be covered, such as sharding, partitioning, distributed event store and stream processing, in-memory data structures, job schedulers, ETL pipelines, real-time monitoring, containerisation, and more.

Fig. 2 - Open source tools used in infrastructure

Fig. 2 - Open source tools used in infrastructure

Phase 2 - Deep Learning

Phase 2 will focus on core machine learning and deep learning techniques. Throughout this period, we will cover fundamental mathematical concepts in a practical way with the use of ‘morning challenges’ and built-in-house visualisation tools.

We will learn about different model architectures covering MLP, CNNs, RNNs, VAEs, GANs, Diffusers and Transformers. We will use PyTorch extensively to tackle both predictive and generative tasks. With the use of GPUs, we will train generative pre-trained transformers (GPT) from the ground up while at the same time using bigger models like LLaMA for fine-tuning and Stable Diffusion for image generation.

The goal through this phase is to get familiar with a variety of tasks, from text summarisation, zero-shot learning, image classification, object detection, instance segmentations, text and image generation, action recognition, topics extraction and more.