Media FAQs

Question 1

What is Media and what does it do?

Accepted Answer

Media is an AI-powered media generation tool that allows you to create images, videos, music, and speech using Google Gemini models. It integrates directly with AI agents via an MCP server, enabling seamless content creation.

Question 2

Which Google Gemini models does Media utilize?

Accepted Answer

Media leverages cutting-edge Google Gemini models, including Gemini Nano Banana for image generation, Veo models for video, Lyria RealTime for music composition, and Gemini TTS for text-to-speech conversion.

Question 3

Is a Google Gemini API key required to use Media?

Accepted Answer

Yes, a Google Gemini API key is essential. You must set it as an environment variable (`GEMINI_API_KEY`) or configure it within your MCP client setup to authenticate and use the service.

Question 4

How does Media handle the output of generated files?

Accepted Answer

You can specify an output directory (`MEDIA_OUTPUT_DIR`) to save generated files locally, in which case the tool returns only the file path. Alternatively, if no directory is set, the tool returns the raw base64-encoded data directly.

Question 5

What types of media can I generate using Media?

Accepted Answer

You can generate high-resolution AI images (up to 4K) with various aspect ratios, text-to-video with native audio, dialogue, and sound effects, instrumental music with genre/mood controls, and natural text-to-speech with voice selection and style control.

Media

Media

주요 기능

사용 사례

주요 기능

사용 사례