What evaluation metrics does this skill support?

It covers a wide range of metrics including perplexity, functional correctness for code, semantic similarity, and AI-as-a-judge methodologies.

Does it cover cost optimization for AI systems?

Absolutely. The skill offers high-ROI optimization strategies for inference, such as quantization, caching, and batching to reduce operational costs.

How does this skill help with model selection?

It provides structured frameworks to evaluate foundation models based on compute resources, domain-specific requirements, and post-training alignment needs.

Can I use this for RAG implementation?

Yes, it includes detailed guidance on essential RAG components, including chunking strategies, vector search index selection, and hybrid retrieval algorithms.

How does it handle AI safety and security?

It provides defensive engineering patterns to protect against prompt injection and instructions for implementing input/output guardrails.

AI Engineering & MLOps Guide

Name: AI Engineering & MLOps Guide
Author: odewahn

byodewahn

0•

Data Science & ML

Provides comprehensive guidance and best practices for designing, building, and scaling production-grade AI systems and machine learning pipelines.

This skill acts as a domain-specific expert for the entire AI lifecycle, drawing from industry-standard practices for building reliable machine learning systems. It assists users in making critical architectural decisions, such as choosing between RAG and fine-tuning, curating high-quality datasets, and implementing robust evaluation frameworks for open-ended LLM outputs. Whether you are optimizing inference latency for high-throughput systems or designing AI agents with sophisticated feedback loops, this skill provides the patterns and methodologies needed to successfully transition AI projects from prototype to production.

Key Features

01Decision frameworks for RAG, fine-tuning, and prompt engineering

02Detailed evaluation methodologies for LLMs and AI agents

030 GitHub stars

04Inference optimization techniques for latency and cost reduction

05Strategies for dataset curation and synthetic data generation

06Architectural patterns for guardrails and user feedback loops

Use Cases

01Optimizing LLM serving infrastructure through quantization and batching strategies

02Architecting a production-ready RAG system with hybrid search and optimized chunking

03Developing automated evaluation pipelines to measure model accuracy and safety

Key Features

01Decision frameworks for RAG, fine-tuning, and prompt engineering

02Detailed evaluation methodologies for LLMs and AI agents

030 GitHub stars

04Inference optimization techniques for latency and cost reduction

05Strategies for dataset curation and synthetic data generation

06Architectural patterns for guardrails and user feedback loops

Use Cases

01Optimizing LLM serving infrastructure through quantization and batching strategies

02Architecting a production-ready RAG system with hybrid search and optimized chunking

03Developing automated evaluation pipelines to measure model accuracy and safety