Does it support advanced encoding like Target Encoding?

Yes, it includes patterns for likelihood encoding and mixed-model encoding using the 'embed' package extension.

Can I use this for natural language processing tasks?

Yes, the skill covers text-specific features like tokenization, stopword removal, and TF-IDF through the 'textrecipes' package.

What is the primary purpose of the Recipes Feature Engineering skill?

It provides standardized code patterns and best practices for creating data preprocessing pipelines using the R 'recipes' package within the Tidymodels ecosystem.

What imputation methods are included?

The skill provides patterns for simple mean/mode imputation as well as more complex methods like K-Nearest Neighbors (KNN) and bagged tree imputation.

How does this skill help prevent data leakage?

It includes specific guidelines on the correct ordering of steps (e.g., imputing before normalizing) to ensure that all statistics are computed solely from training data.

Recipes Feature Engineering

Name: Recipes Feature Engineering
Author: choxos

bychoxos

0•

数据科学与机器学习

Streamlines machine learning data preprocessing in R using standardized Tidymodels recipes patterns.

This skill provides a comprehensive library of patterns for the R 'recipes' package, enabling data scientists to build robust, reproducible preprocessing pipelines. It covers the entire feature engineering lifecycle, including numeric normalization, categorical encoding, missing data imputation, and dimensionality reduction. By following the included best-practice guidelines for step ordering, users can prevent information leakage and ensure their models generalize well. It also integrates specialized extensions for handling text data, date/time features, and class imbalance, making it an essential tool for Tidymodels practitioners.

主要功能

01Best-practice step ordering to prevent data leakage during preprocessing

02Standardized patterns for numeric normalization and non-linear transformations

03Specialized steps for text processing, date/time extraction, and class imbalance

04Advanced missing data imputation using KNN, Bagging, and Linear models

050 GitHub stars

06Comprehensive categorical encoding including dummy variables and target encoding

使用场景

01Handling complex datasets with high-cardinality categories or missing values

02Implementing automated feature selection and dimensionality reduction like PCA or UMAP

03Building production-grade preprocessing pipelines for Tidymodels workflows

主要功能

01Best-practice step ordering to prevent data leakage during preprocessing

02Standardized patterns for numeric normalization and non-linear transformations

03Specialized steps for text processing, date/time extraction, and class imbalance

04Advanced missing data imputation using KNN, Bagging, and Linear models

050 GitHub stars

06Comprehensive categorical encoding including dummy variables and target encoding

使用场景

01Handling complex datasets with high-cardinality categories or missing values

02Implementing automated feature selection and dimensionality reduction like PCA or UMAP

03Building production-grade preprocessing pipelines for Tidymodels workflows