What does the onboard-new-source skill do?

It provides a systematic workflow to catalog, document, optimize with reflections, and secure access for any data source newly added to Dremio.

Can it manage data permissions?

Yes, it includes steps to check and configure access grants for specific datasets, ensuring that only authorized roles can query sensitive information.

How does it help with query performance?

It automates the creation of Raw and Aggregate Reflections, which allow Dremio to accelerate queries by using pre-computed materializations.

Does this skill connect the physical data source to Dremio?

No, the source must already be connected via the Dremio UI or API. This skill handles the essential post-connection steps like discovery and optimization.

Onboard New Dremio Source

Name: Onboard New Dremio Source
Author: dremio

bydremio

•

Database Management

Streamlines the discovery, documentation, performance optimization, and security configuration of new data sources in Dremio.

This skill provides a structured, automated framework for data engineers to integrate new data sources into the Dremio data lakehouse. It guides the AI through a complete post-connection lifecycle, from identifying available tables and profiling data distributions to setting up performance-boosting reflections and implementing robust role-based access controls. By following a systematic checklist, it ensures that every new source is production-ready, well-documented, and optimized for high-speed SQL analytics.

Key Features

015 GitHub stars

02Role-based access control (RBAC) and grant management

03Performance acceleration via Raw and Aggregate Reflections

04Comprehensive data profiling and statistical analysis

05Automated data source discovery and schema enumeration

06End-to-end query verification and job profiling

Use Cases

01Implementing security best practices for newly added organizational data

02Integrating a new S3 bucket or relational database into a Dremio project

03Optimizing query performance for large-scale datasets using automated reflections

Key Features

015 GitHub stars

02Role-based access control (RBAC) and grant management

03Performance acceleration via Raw and Aggregate Reflections

04Comprehensive data profiling and statistical analysis

05Automated data source discovery and schema enumeration

06End-to-end query verification and job profiling

Use Cases

01Implementing security best practices for newly added organizational data

02Integrating a new S3 bucket or relational database into a Dremio project

03Optimizing query performance for large-scale datasets using automated reflections