AI Engineering Hub (PDF File) – AI Tutorial Materials

AI Tools updated 3d ago dongdong
7 0

Overview

AI Engineering Hub is an open-source project that offers in-depth tutorials and practical case studies focused on large language models (LLMs), retrieval-augmented generation (RAG), and AI agent development. The project combines theoretical explanations with a rich set of hands-on code examples to help users quickly get started. It has gained significant attention on GitHub with over 12.6K stars. The core tutorials have been compiled into a 500+ page PDF for easier learning. Designed for beginners, practitioners, and researchers alike, AI Engineering Hub encourages community contributions to advance AI development collaboratively.

AI Engineering Hub (PDF File) – AI Tutorial Materials


AI Engineering Hub Course Overview

The AI Engineering Hub curriculum is a comprehensive and practical guide to data science and machine learning, covering topics from beginner to advanced levels. The content is structured into two major parts:


1. Deep Learning

Learning Paradigms

  • Transfer Learning, Fine-tuning, Multi-task Learning (MTL), and Federated Learning

    • Transfer Learning: Leverages a pretrained model (foundation model), replaces top layers, freezes the rest, and adapts to new small-scale tasks.

    • Fine-tuning: Adjusts part or all of the model weights for new data directly.

    • MTL: Shared layers handle multiple tasks, with separate task-specific branches to improve generalization and save compute.

    • Federated Learning: Trains models across decentralized devices, aggregating parameters (not data) to preserve user privacy.

Implementing MTL in PyTorch

  • Combines shared layers with task-specific branches.

  • Uses gradient accumulation and dynamic task weighting (e.g., based on validation accuracy).

Active Learning

  • Starts with a small labeled dataset.

  • The model selects low-confidence samples for manual labeling, iteratively improving performance—ideal when labeling is expensive.

Runtime and Memory Optimization

  • Momentum: Accelerates convergence using a moving average of gradients.

  • Mixed Precision Training: Combines float16 and float32 to speed up forward/backward passes while maintaining precision.

  • Gradient Checkpointing: Saves memory (~50–60%) by recomputing activations during backpropagation (at ~15–25% extra time cost).

  • Gradient Accumulation: Simulates large batch sizes by accumulating gradients over small batches before updating.

4 Multi-GPU Training Strategies

  1. Model Parallelism: Splits model layers across GPUs, requires communication.

  2. Tensor Parallelism: Splits single-tensor operations like matrix multiplication.

  3. Data Parallelism: Replicates models across GPUs and aggregates gradients.

  4. Pipeline Parallelism: Uses micro-batches and pipelined stages to improve GPU utilization.

AI Engineering Hub (PDF File) – AI Tutorial Materials

Additional Concepts

  • Label Smoothing: Prevents overconfidence by distributing some probability mass to other classes.

  • Focal Loss: Addresses class imbalance by reducing the loss for well-classified samples.

  • Dropout: Randomly drops neurons during training and rescales activations to maintain consistency.

    • Caution in CNNs: Breaks spatial structure—better used in fully connected layers.

  • Hidden Layers & Activation Functions: Explain how features are hierarchically extracted and non-linearity introduced.

  • Shuffling Before Training: Ensures batch randomness to improve generalization.

Model Compression

  • Knowledge Distillation: A smaller student model learns from a larger teacher model.

  • Activation Pruning: Dynamically removes less significant neurons.

Deployment

  • Deploy from Jupyter to production seamlessly.

  • Use A/B testing, shadow deployment, version control, and model registry for production readiness.

AI Engineering Hub (PDF File) – AI Tutorial Materials

LLM-Specific Topics

  • GPU Memory Management: Identifies bottlenecks during training.

  • Fine-Tuning Techniques: Comparison of full fine-tuning, LoRA (low-rank adaptation), and RAG.

  • 5 Lightweight LLM Fine-Tuning Methods: Including adapters, prefix tuning, etc.


2. Classical Machine Learning

ML Fundamentals

  • Time Complexity of 10 Algorithms: Compares training and inference costs (e.g., SVM, Random Forest).

  • 25 Key Math Definitions: Covers probability, linear algebra, etc.

  • Multiclass Probability Calibration: Techniques like Platt Scaling.

  • Model Failures: Common pitfalls—data quality, bad metrics, etc.

  • 16 Algorithm Loss Functions: With derivations and use cases.

  • 10 Popular Loss Functions: Including cross-entropy, Huber loss.

  • Data Splitting Best Practices: Preventing data leakage.

  • 5 Cross-Validation Methods: Includes k-fold, LOOCV, etc.

  • Post-CV Steps: Model selection and final evaluation.

  • Double Descent & Bias-Variance Trade-off: Non-linear model complexity impact.

Statistical Foundations

  • MLE vs. EM: MLE for complete data; EM for hidden variables (e.g., clustering).

  • Confidence vs. Prediction Intervals: Differences and how to calculate.

  • Why OLS is Unbiased: Gauss-Markov theorem explained.

  • Bhattacharyya Distance & Mahalanobis Distance: Measures for distribution similarity.

  • 11 Normality Tests: Shapiro-Wilk, QQ plots, etc.

  • Probability vs. Likelihood: Key differences and MLE connection.

  • 11 Distributions: Use cases for Poisson, Exponential, etc.

  • Misconceptions on PDFs: Density ≠ probability.

AI Engineering Hub (PDF File) – AI Tutorial Materials

Feature Engineering and Selection

  • 11 Variable Types: Nominal, ordinal, continuous—impacts encoding.

  • Encoding Cyclical Features: sin/cos encoding for hours, months, etc.

  • Discretization: Binning continuous variables for robustness.

  • 7 Categorical Encoding Techniques: One-hot, target, hash encoding, etc.

  • Feature Importance via Shuffling: Measures impact on performance.

  • Probe Feature Method: Adds noise features to identify weak predictors.

AI Engineering Hub (PDF File) – AI Tutorial Materials

Regression Analysis

  • MSE Mathematics: Convexity ensures convergence; sensitive to outliers.

  • Sklearn Linear Regression: Closed-form solution, no tuning needed.

  • Poisson vs. Linear Regression: Poisson for count data.

  • Dummy Variable Trap: Omit one category to avoid multicollinearity.

  • GLMs: Generalizes linear regression using link functions.

  • Zero-Inflated Models: Combines logistic and Poisson for excessive zeros.

Tree-Based Methods

  • Compressing Random Forests: Extract key paths to build a single interpretable tree.

  • Why Trees Overfit: Naturally low-bias; control variance via pruning.

  • AdaBoost: Trains weak learners iteratively to focus on hard samples.

  • OOB Validation: Uses unsampled data (~37%) for validation.

AI Engineering Hub (PDF File) – AI Tutorial Materials

Dimensionality Reduction

  • PCA Variance Explained: Retain 95% variance for optimal compression.

  • t-SNE vs. PCA: t-SNE preserves local structure, PCA preserves global variance.

  • Kernel PCA: Maps data to higher-dimensional space for non-linear structures.

Clustering

  • KMeans vs. GMM: GMM allows soft assignments and is more flexible.

  • DBSCAN++: Accelerated density-based clustering using core point sampling.

  • HDBSCAN: Identifies clusters with varying density.

Correlation Analysis

  • Pearson Limitations: Detects only linear correlations.

  • Anscombe’s Quartet: Same stats, different distributions—always visualize.

Missing Data Handling

  • MCAR/MAR/MNAR: Different types of missingness and proper handling.

  • MissForest Imputation: Uses random forests to predict missing values in mixed data.

AI Engineering Hub (PDF File) – AI Tutorial Materials


Data Tools & Programming Techniques

  • Pandas/Polars/SQL/PySpark Syntax Comparison: Handy for migrating between tools.

  • GPU-Accelerated Pandas (cuDF): Boosts performance on large datasets with GPU support.

  • Advanced SQL

    • Semi Join: Efficient filtering when right-side fields aren’t needed.

    • NOT IN Pitfall: NULLs can lead to empty results—prefer NOT EXISTS.

  • Python OOP Techniques

    • Descriptors: Control attribute access (e.g., with @property).

    • Why model() instead of forward() in PyTorch: Preserves hooks and autograd.


GitHub Repository

📂 https://github.com/patchy631/ai-engineering-hub


Who Is AI Engineering Hub For?

  • Beginners: Understand core AI concepts with accessible, hands-on guidance.

  • Developers & Practitioners: Apply AI techniques directly to real-world projects with ready-made code.

  • Researchers: Stay updated with cutting-edge methods and share your findings.

  • Data Scientists: Combine traditional data techniques with modern AI workflows.

  • Tech Enthusiasts: Explore new frameworks, tools, and experimental approaches in AI.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...