Discover Agent Skills for data science & ml. Browse 53skills for Claude, ChatGPT & Codex.
Deploys and optimizes LLM inference on CPU, Apple Silicon, and consumer hardware using GGUF quantization.
Optimizes Transformer models using Flash Attention to achieve significant speedups and memory reductions during training and inference.
Fine-tunes large language models using LoRA, QLoRA, and other parameter-efficient methods to drastically reduce memory and compute requirements.
Extends transformer context windows using RoPE, YaRN, and ALiBi techniques to process documents exceeding 128k tokens.
Streamlines the fine-tuning process for over 100 large language models using the LLaMA-Factory framework and QLoRA techniques.
Builds complex AI systems using Stanford's declarative programming framework to optimize prompts and create modular RAG systems automatically.
Transcribes audio, translates speech to English, and automates multilingual audio processing using OpenAI's Whisper models.
Builds LLM-powered applications using agents, retrieval-augmented generation (RAG), and modular chains.
Enables zero-shot image classification and semantic image search by connecting visual concepts with natural language.
Deploys high-performance Reinforcement Learning from Human Feedback (RLHF) workflows using Ray and vLLM acceleration for large-scale model alignment.
Serves Large Language Models with maximum throughput and efficiency using vLLM's PagedAttention and continuous batching.
Optimizes AI models for efficient local inference using the GGUF format and llama.cpp quantization techniques.
Accelerates LLM inference speeds by up to 3.6x using advanced decoding techniques like Medusa heads and lookahead decoding.
Integrates Salesforce's BLIP-2 framework to enable advanced image captioning, visual question answering, and multimodal reasoning within AI workflows.
Enables advanced vision-language capabilities for image understanding, multi-turn visual conversations, and document analysis.
Optimizes large-scale AI model training using PyTorch Fully Sharded Data Parallelism for efficient memory management and scaling.
Connects LLMs to private data sources through advanced document ingestion, vector indexing, and retrieval-augmented generation (RAG) pipelines.
Merges multiple fine-tuned AI models using mergekit to combine specialized capabilities like math and coding without expensive retraining.
Curates high-quality datasets for LLM training using GPU-accelerated deduplication, filtering, and PII redaction.
Interprets and manipulates neural network internals for any PyTorch model, including massive foundation models via remote execution.
Generates high-fidelity music and sound effects from text descriptions using Meta's AudioCraft framework.
Guarantees valid, type-safe JSON and structured outputs from Large Language Models using grammar-based constraints.
Orchestrates distributed machine learning training across clusters to scale PyTorch, TensorFlow, and Hugging Face models.
Implements Group Relative Policy Optimization (GRPO) using the TRL library to enhance model reasoning and structured output capabilities.
Visualizes machine learning training metrics and model performance to streamline experiment tracking and model debugging.
Manages high-performance vector search and storage for production RAG and AI applications using Pinecone's serverless infrastructure.
Enforces structured LLM outputs using regex and grammars to guarantee valid JSON, XML, and code generation.
Implements Anthropic's Constitutional AI method to train harmless, helpful models through self-critique and automated AI feedback.
Implements and optimizes RWKV architectures, a hybrid RNN-Transformer model offering linear-time inference and infinite context windows.
Decomposes complex neural network activations into sparse, interpretable features to understand and steer model behavior.
Scroll for more results...