Artificial Intelligence in the Audiovisual Sector

asieranitua.com · Professional guide for the audiovisual industry

Artificial Intelligence in the Audiovisual Sector

A practical guide for Media & Entertainment professionals

April 2026 Edition — Updated

67 pages · 20 chapters ·

This is not a book meant to convince you that artificial intelligence exists or that it “is going to transform the industry.” That has already happened. As you read this sentence, AI systems are analyzing millions of hours of audiovisual content, generating trailers, subtitling in real time, adjusting scene color, and deciding which series will be recommended to you tonight.

The question that matters is no longer “Will AI reach the audiovisual sector?” The question is: do you have the conceptual and practical tools to work with it, direct it, question it, and extract value from it? Or are you merely observing it from the outside?

This book is written for industry professionals: directors, producers, editors, sound technicians, distributors, platform managers, archive managers, creatives, and executives. For those already working in the sector who need to understand what is really happening, beyond the media noise.

Note on model versions

This book includes references to specific model and tool versions updated as of April 2026: GPT-5.4, Claude Opus 4.7, Midjourney v8, Runway Gen-4, Kling 3.0, Seedance 2.0, Sora 2, Google Veo 3, Suno V4. These versions will become outdated before the book ages in other dimensions. The criteria for evaluating them, the principles for adopting them, and the risks involved in managing them will not. Read the tool names as examples of the state of the art at the time of writing; read the decision frameworks as a lasting reference.

The 2026 edition adds an element that barely existed as an emerging concept in the previous edition: AI agents. An agent is not a model that answers questions; it is a system that plans, executes chained tasks, uses external tools, makes autonomous decisions, and can operate for hours without human intervention. For the audiovisual sector, agents represent the next qualitative leap after generation: moving from “AI helps me create” to “AI manages entire parts of my workflow.”

You will not find abstract philosophy here. You will find useful explanations, real tools, concrete cases, and reflections on ethics and regulation. Everything updated to 2026. With the honesty of someone who knows that any list of tools ages, but the principles that support it do not.

The audiovisual industry has always been a pioneer in adopting technology: from celluloid to digital, from linear to nonlinear, from broadcast to streaming. AI is the next leap. And like all the previous ones, it does not replace human creativity: it reconfigures it.

Chapter 1

Foundations every audiovisual professional needs to understand

Artificial intelligence is not a product. It is a family of mathematical and computational techniques that allow systems to learn from data, identify patterns, and make decisions. Under that umbrella are very different things: from an algorithm that detects faces in a video to a system that autonomously plans and executes a complete content production workflow.

For the audiovisual professional, the useful approach is to understand four layers that build on one another: machine learning, deep learning, generative models, and, most recently, AI agents. Each layer radically expands what the previous one made possible.

1.1 Machine Learning: learning from examples

Machine learning is the foundation of almost everything. Instead of explicitly programming every rule, the system is fed thousands or millions of examples, and it learns to classify, predict, or generate new cases. A system trained on thousands of action scenes learns to recognize rhythmic cuts. One trained on millions of subtitles learns to transcribe audio with increasing accuracy.

In audiovisual production, machine learning operates in three main modes. In supervised learning, the system learns from labeled data: “this scene is exterior/night,” “this shot is a close-up of a face.” In unsupervised learning, the system finds patterns without prior labels: it automatically groups similar scenes without anyone telling it what the criteria are. And in reinforcement learning, the system learns through trial and error with feedback: Netflix recommendation systems use it to optimize engagement.

In audiovisual production, machine learning is applied to: automatic archive cataloging, image quality analysis, content recommendation, commercial success prediction for projects, and error detection in postproduction.

1.2 Deep Learning: networks that imitate the brain

Deep learning uses multi-layer neural networks. Each layer extracts more abstract features from the previous one: the first detects edges, the next detects shapes, the next detects complete objects. This hierarchical architecture is what makes possible photorealistic image generation, upscaling old video to 4K, noise removal in night recordings, or synthesis of voices indistinguishable from human voices.

The two architectures most relevant to the audiovisual sector are convolutional neural networks (CNNs), specialized in image and video, and recurrent neural networks (RNNs and their LSTM variants), specialized in temporal sequences such as audio and text. Both have largely been surpassed by Transformers in the most demanding tasks, but they remain the foundation of many standard production tools.

1.3 Transformers and generative models: the revolution underway

Transformers, introduced in 2017 in Google’s paper “Attention Is All You Need,” changed everything. They are the architecture underlying large language models (GPT-5.4, Claude Opus 4.7, Gemini 2.0/3) and image and video generation models. Their “attention” mechanism allows the model to evaluate the relevance of each part of a sequence in relation to the rest, making them extraordinarily powerful for any task involving context, narrative, or temporal continuity: exactly what defines audiovisual work.

The generative models most relevant to the sector in 2026 are: Claude Opus 4.7 and GPT-5.4 for text, script development, and narrative analysis; Midjourney v8 and FLUX 1.1 Pro for high-quality still images; Seedance 2.0 (ByteDance), Runway Gen-4, Kling 3.0, Sora 2, and Google Veo 3 for video generation and editing; ElevenLabs for voice synthesis with emotional cloning; and Hume AI for emotion analysis and synthesis in audio.

Key 2026 insight

Multimodal models (which simultaneously process text, image, audio, and video) are now the norm at the frontier of the field. GPT-5.4 and Gemini 2.0 Flash can analyze a video clip, transcribe the audio, describe the visual composition, and generate directing notes, all in a single call. This has direct implications for production workflows.

1.4 Why now and not before

Three factors converged between 2022 and 2025 to produce the current leap: the massive availability of training data accumulated over decades of digitization, the exponential increase in computational power through specialized chips (A100/H100 GPUs, Google TPUs, Apple and Amazon proprietary chips), and algorithmic advances that made it possible to train unprecedented-scale models with increasing efficiency.

The result is that tools that in 2020 were laboratory prototypes are now accessible commercial products integrated into the industry’s standard software. Adobe Firefly is in Premiere Pro and After Effects. DaVinci Resolve includes AI for color, audio, and VFX by default in its free version. The barrier to entry has disappeared. What remains is learning how to use these tools with professional judgment.

Chapter 2

Intelligent preproduction

Preproduction is the phase where AI perhaps has the greatest potential return on investment: the decisions made here determine the cost and quality of everything that follows. Script errors, inefficient scheduling, or problematic locations become expensive during production. AI does not eliminate these problems, but it gives teams tools to detect them earlier and explore more options with fewer resources.

2.1 From concept to script

AI is not going to write the next great screenplay. But it can make the development process much more efficient and exploratory. Today’s language models are powerful tools for idea development, generating narrative options, and identifying structural problems before they cost real money.

In practice, professional screenwriters use Claude, ChatGPT, and specialized tools such as Fade In Pro, Highland 2, or WriterDuet with AI integration to: generate variants of problematic scenes without compromising the original voice; test different dialogue tones for the same character; identify character arc inconsistencies across a long script; research historical, cultural, or technical references exhaustively; and create first versions of production documents: synopses, series bibles, treatments, one-pagers.

ScriptBook and Cinelytic add an analytical layer that general language models do not offer: they predict a script’s likely commercial performance based on pattern analysis of historical success in box office and platforms. They are not oracles, but they are useful tools for development teams at majors and streaming platforms evaluating dozens of projects simultaneously with limited development budgets.

An emerging use in 2026: narrative coherence analysis with multimodal AI. You can pass the full script, visual moodboard, and music references to a model, and the system identifies dissonances between the written tone and the proposed visual tone. A five-minute conversation with the model can reveal concept problems that might take weeks to emerge in the human creative process.

2.2 Data-assisted casting

Platforms such as Casting Networks, Backstage, and Spotlight are integrating AI for matching roles and talent. The systems analyze: the actor’s previous role history, interpretive range compatibility with the character, visual look according to the director’s references, availability on shooting dates, and social media impact as an indicator of appeal for streaming platforms.

This does not replace the casting director or the creative director; it accelerates preselection and reduces the time spent on clearly incompatible candidates. In productions with limited budgets and tight schedules, this saving can be significant.

An important warning: facial analysis systems and actor classification by “type” carry embedded historical biases that can systematically discriminate against actors of certain ethnicities, ages, or physical appearances. The European AI Act classifies these systems as high-risk when they affect employment decisions. European producers must be especially cautious with casting solutions assisted by facial recognition.

2.3 Visualization, storyboard, and production design

With Midjourney v8, DALL-E 3.5, Adobe Firefly, or Stable Diffusion XL, a director can generate in minutes visual references for scenes, camera angles, wardrobe proposals, color palettes, and location atmospheres that previously required hours of work from a specialized illustrator.

This does not eliminate the work of the concept artist or the production designer. It changes their role: from executor of variants to curator, refiner, and visual communicator. Human value shifts toward judgment, selection, and coherence across the project’s visual universe.

StoryboardHero and Boords incorporate AI specifically for storyboard creation: the system interprets scene descriptions from the script and automatically generates a first storyboard that the team can refine. For motion animatics, Runway and Sora can generate low-fidelity video versions that allow the director to visualize the flow of the sequence before shooting.

Google Earth Studio, combined with AI plugins, makes it possible to simulate lighting conditions in real locations at different times of day and seasons of the year. For exterior shoot planning, this removes a significant uncertainty variable. Systems such as Geospy can analyze location databases and find matches with specific visual references provided as images.

2.4 Planning, budgeting, and scheduling

Movie Magic Scheduling and Gorilla Scheduling incorporate AI to optimize shooting order while minimizing transportation costs, cast and crew availability issues, and logistical conditions at locations. These systems analyze the full script, identify dependencies between scenes, and propose shooting sequences that can significantly reduce production days.

In high-budget productions, the difference between optimal scheduling and mediocre scheduling can amount to hundreds of thousands of euros. The ROI of implementing scheduling AI is one of the easiest to calculate and justify in the production chain.

For budgeting, tools such as GreenShoots AI analyze complete scripts and automatically identify cost elements: number of locations, special effects, required extras, estimated shooting days per scene. They generate first-estimate budgets that, while requiring human review, provide a solid starting point much faster than the manual process.

Chapter 3

AI on set

The set has historically been the space most resistant to automation: the unpredictability of live shooting, human talent management, and real-time creative decision-making seemed immune to AI. That resistance is eroding, not because AI has conquered on-set creativity, but because it has efficiently colonized everything around that creativity.

3.1 Virtual production: the set that does not exist

The Mandalorian popularized the concept of LED volumes with StageCraft in 2019, but since then the cost of the technology has fallen and the number of equipped studios has grown exponentially across Europe, Asia, and Latin America. In 2026, virtual production is no longer a competitive advantage reserved for major studios: it is an accessible option for mid-budget productions.

AI intervenes across multiple layers of the virtual production ecosystem. Unreal Engine 5 uses Lumen (real-time global illumination) and Nanite (high-density geometry with no rendering cost) to generate photorealistic environments that react in real time to the real light on set. Camera tracking systems, such as Mo-Sys StarTracker, use AI to maintain the correct perspective of the virtual environment with completely free camera movements. And real-time compositing algorithms allow the director’s monitor to show the final result of the scene, not the green or neutral gray background of the volume.

For producers and directors, this means: less postproduction time because much of the VFX work is resolved on set; better decision-making because the director sees the real result; the possibility of shooting impossible, inaccessible, or nonexistent locations; and a significant reduction in production travel and transportation, with the corresponding impact on carbon footprint.

3.2 Autonomous and intelligent cameras

AI-powered robotic camera systems, such as those from Cinfo Tiivii or proprietary developments from ESPN, Sky Sports, and beIN Sports, make it possible to produce live broadcasts of sports events, concerts, and programs with minimal human operators. The AI detects the ball, anticipates player movement, selects the best available angle, and performs real-time editing cuts following configurable production rules.

In controlled studio environments, intelligent PTZ systems with facial tracking AI allow interview programs, news shows, video podcasts, and educational programs to operate without camera operators, with automatic composition that adjusts in real time to the movement and position of presenters. In 2026, many regional news channels and digital content producers operate their studios entirely with autonomous camera systems.

AI tracking drones have democratized cinematic aerial shots. DJI systems with ActiveTrack 360, the Sony Airpeak S1, and new Skydio drones offer predictive tracking of moving subjects, with automatic composition and stabilization that produces footage indistinguishable from that of a specialized operator. Any independent production can now obtain cinematic-quality aerial footage that previously required a helicopter and an aerial cinematography crew.

3.3 Real-time assistance during shooting

An emerging category of AI tools intervenes during shooting to detect problems before they become costly. Continuity analysis systems, integrated with cameras or with the DIT proxy system, compare frames between takes and alert teams to inconsistencies in wardrobe, makeup, props, or actor position. ScriptE, in its 2026 version, combines continuity management with AI analysis that learns the shooting patterns and generates proactive alerts.

Real-time audio analysis software, such as the tools integrated into Sound Devices and Zaxcom sound systems, detects acoustic problems, radio-frequency interference, clipping, and synchronization errors before the shooting moment is lost. Pomfort Livegrade, the standard in the DIT station, incorporates AI analysis for LUT management and real-time exposure control.

Automatic slate systems, such as Mavis and modules integrated into Silverstack, identify and log each take with precise metadata, automatically synchronizing audio and video and organizing files according to the script structure. This removes hours of organizational work in postproduction and drastically reduces the risk of synchronization errors.

Chapter 4

Augmented postproduction

Postproduction is where AI has had the most immediate, measurable, and universally adopted impact in the audiovisual sector. The reason is structural: postproduction contains a very high proportion of repetitive, well-defined technical tasks, exactly the kind of work where AI surpasses humans in speed and consistency. This frees the professional to focus on decisions where human judgment is irreplaceable.

4.1 Editing: from hours to minutes

Adobe Premiere Pro 2025-2026 has incorporated a set of AI features that transform the editing workflow. Automatic audio transcription, with accuracy above 95% under standard conditions, allows any moment of the footage to be searched by text and edited directly by cutting in the transcript. For interviews, documentaries, and news content, this can cut rushes selection time in half.

The clip extension feature with Firefly Video generates additional frames at the beginning or end of a shot using image synthesis, allowing shots to be lengthened without artificial repetition. Intelligent reframing analyzes the content of each frame and automatically adapts the original framing to any aspect ratio (16:9, 9:16, 1:1, 4:5) while keeping the subject centered. For multiplatform distribution, this eliminates the manual process of creating versions for each format.

DaVinci Resolve 19, whose free version includes most AI functions, incorporates Magic Mask (automatic frame-by-frame moving mask tracing), IntelliTrack (object and person tracking in complex scenes), object removal with background synthesis, and intermediate frame generation for frame rate conversion. Its AI audio integration, in collaboration with Fairlight, includes automatic dialogue analysis and mixing suggestions.

For social media and digital distribution, tools such as CapCut Business, Descript, and OpusClip represent a new category: automated content repurposing. Given a long video (webinar, interview, event, video podcast), these systems automatically identify the most engaging moments, generate 60-90 second clips optimized for each platform, add animated subtitles, and adapt the framing. The workflow that previously required a dedicated editor for hours now runs in minutes with minimal human review.

4.2 Color, VFX, and restoration

Color correction has historically been one of the most time-intensive specialties in postproduction. DaVinci Resolve 19 incorporates tools that automate much of the technical work: automatic scene analysis suggests grades based on genre and emotional tone detection; Colourlab AI enables automatic matching between shots from different cameras or lighting conditions; and FilmConvert Nitrate emulates the response of photochemical film with intelligent adjustment to the characteristics of the sensor used.

In VFX, democratization has been radical. Runway Gen-4 allows complex visual elements (smoke, fire, water, explosions, creatures, environments) to be generated and modified, elements that previously required teams of specialized artists using software costing thousands of euros. Seedance 2.0 by ByteDance, launched in April 2026, adds native audio generation synchronized with video in a single pass, making it the most complete model on the market for integrated audiovisual production. Adobe Firefly Video integrates VFX directly into the After Effects/Premiere workflow with element generation from text descriptions. For independent and low-budget productions, this has removed the access barrier to quality visual effects.

For archive restoration, AI has opened a new era. Topaz Video AI can upscale old footage to 4K with intelligent detail reconstruction, recover damaged frames or frames with compression artifacts, stabilize shaky recordings with motion interpolation, and remove film grain while preserving organic texture. Archives that were previously unusable for high-definition broadcast can be fully rehabilitated. For broadcasters with historical archives, this represents a high-value asset that until three years ago was technically inaccessible.

4.3 Audio: the silent revolution

If there is one area where AI has produced qualitative leaps bordering on the unbelievable over the past two years, it is audio. iZotope RX 11 can separate voices from ambient noise with a precision that in 2022 would have required studio rerecording: dialogue recorded outdoors with heavy traffic, wind, or crowds can be cleaned in seconds. RX 11’s Dialogue Isolation module uses deep learning models to identify and isolate the human voice from any background noise, regardless of recording conditions.

Adobe Podcast Enhanced Speech, available for free in the browser, transforms amateur-quality recordings (laptop microphone, earbuds, webcam) into studio-quality audio in one click. For production companies with tight budgets that need to interview remote subjects, this has removed one of the main technical quality barriers.

For dubbing and international localization, ElevenLabs v3 and Papercup have transformed the market. ElevenLabs can clone an actor’s voice in any language while preserving their timbral characteristics (timbre, texture, register range), characteristic intonation, and emotional registers. The result is dubbing that sounds like the original actor speaking another language, not like a substitute. For documentary, corporate, educational, and low-budget productions, high-quality automatic dubbing is now a commercial reality.

For premium productions with top-tier actors, the legal and contractual issue (voice rights, explicit consent) remains the main barrier, not technical quality. The 2023 SAG-AFTRA contracts and equivalent European unions require specific negotiation for the use of voice cloning.

Automatic audio mastering with tools such as LANDR, Auphonic, and loudness systems integrated into distribution platforms ensures correct loudness levels according to each platform’s standards (Netflix -27 LUFS, YouTube -14 LUFS, broadcast EBU R128) and automatically adjusts the mix for consumption on mobile, soundbar, headphones, or cinema.

4.4 Archive cataloging: the sleeping asset

The audiovisual archives of broadcasters, channels, distributors, and production companies contain decades of material that for the most part was never systematically cataloged. A typical mid-sized European broadcaster may have between 20,000 and 100,000 hours of archive material. Without accurate cataloging, that material is practically inaccessible for reuse, licensing, or exploitation on digital platforms.

AI is now the only technology capable of addressing this problem at reasonable scale and cost. Systems such as Etiqmedia, widely implemented by major broadcasting operators in Spain and Europe, automatically analyze the content of each file: they recognize faces and identify known personalities, detect locations, identify topics and genres, analyze the emotional tone of scenes, transcribe audio in multiple languages, and generate structured metadata that makes the archive searchable, filterable, and commercially exploitable.

The economic impact is direct and quantifiable: properly cataloged archive material can be licensed to third parties, reused in new productions without hours of manual searching, or included in stock footage sales platforms. A 50,000-hour uncataloged archive is a sleeping asset with potential value but prohibitive access cost. One cataloged with AI is an active source of revenue.

Chapter 5

Distribution, platforms, and data

Distribution is where AI has the most invisible power: most audiovisual professionals are not aware of the sophistication of the systems that decide whether their content reaches an audience or not. Understanding how these systems work is not merely interesting; it is strategically essential for anyone producing content intended for digital platforms.

5.1 Recommendation algorithms: the uncomfortable truth

Netflix does not recommend what you think you want. It recommends what its models predict you will watch for the longest time without pausing or abandoning. The real metric is not satisfaction rating or the five-star rating (which Netflix removed in 2017): it is completion rate, viewing time without abandonment. This distinction matters deeply because it defines what content is commissioned, how it is produced, and which stories are told.

Netflix’s recommendation system is an ensemble of dozens of models that consider hundreds of variables simultaneously: each user’s full viewing history, time of day and day of week, the device used to connect, the behavior of other users with similar profiles in their market, when they last paused in a series of the same genre, whether they finished the previous season and in how many days. And it generates not only a personalized list of titles, but also which cover image to show for each one (Netflix has up to 30 cover variants per title, each optimized for a different demographic profile).

For content distributors and creators, understanding this mechanism is fundamental. It is not enough to produce quality content; you must understand how the algorithm will read metadata signals, thumbnail, title, and description, and which engagement patterns in the first days after release are critical for the system to begin actively recommending the content.

5.2 Audience analytics: the new standard

Streaming platforms have access to audience data with a granularity unprecedented in media history: they know exactly at which minute a series is abandoned (with frame-level precision in internal analyses), which scenes are replayed, which moments generate peaks in social media activity, in which markets content performs better and why. This information feeds not only renewal and cancellation decisions, but the creative development processes of new projects themselves.

For traditional television networks and broadcasters that do not have access to such proprietary data, tools such as Parrot Analytics offer multiplatform demand analytics: they measure demand for content across streaming, SVOD, social media, and torrents to build a global demand index by title, market, and demographic. Canvs AI specializes analysis in the emotional dimension: it classifies audience reactions on social media beyond positive/negative sentiment, identifying specific emotions such as surprise, nostalgia, frustration, or enthusiasm that make it possible to understand much better why content connects or fails to connect.

Predictive audience analytics, where models predict content performance before launch, is a capability that in 2026 the major players have (Netflix, HBO Max, Disney+, Amazon) but that is reaching more accessible tools. For independent producers who need to argue a project’s potential before investors or platforms, having predictive data can be a significant differentiator.

5.3 Intelligent MAM and digital asset management

A Media Asset Management (MAM) system integrated with AI can automatically process every file entering the system, running in parallel a set of analyses that would manually require days of work: audio transcription in all detected languages, face recognition and identification of known personalities, location identification, genre and emotional tone detection, technical quality analysis (effective resolution, audio levels, compression ratio), automatic summary generation, and preview clip creation for fast review.

Systems such as Etiqmedia (specialized in European broadcasting), Veritone aiWARE (a comprehensive AI platform for broadcasters with archive search and monetization capabilities), and Cognizant Video Intelligence (oriented toward large corporate and entertainment archives) represent the state of the art in 2026. For broadcasters, distributors, and archives managing tens of thousands of hours of content, the difference between having or not having a MAM with integrated AI is measured in archive team productivity, response time to licensing requests, and the ability to discover and reuse historical material.

5.4 Compression, delivery, and distribution efficiency

Streaming content delivery is an engineering problem with enormous cost implications. Large CDNs spend millions on bandwidth and energy to distribute video. AI is helping optimize this cost in ways that were previously impossible: per-title adaptive encoding that analyzes each specific video and generates compression parameters optimized for its content (a fast action scene needs more bits than a static scene, and the intelligent encoder knows it); encoding with scene prediction that anticipates shot changes and preemptively adjusts bitrate; and predictive distribution that preloads content on CDN nodes closest to the user based on consumption patterns.

Netflix Open Connect, Netflix’s own CDN, is the state-of-the-art reference. Its AI-based per-title encoding system allows the company to deliver 4K HDR quality at bitrates significantly lower than industry standards, with the corresponding bandwidth and energy savings.

5.5 The future of distribution: total personalization

The horizon of AI-based audiovisual distribution points toward something that ten years ago seemed purely speculative: total personalization of content. Not only which content the algorithm recommends, but how that content is presented specifically to you.

Netflix already experiments with cover variants personalized by user. The next step is trailer variants: different versions of the same trailer optimized for different audience profiles, emphasizing action elements for some users and romantic elements for others, all generated or adapted automatically by AI. This is already happening in digital advertising; its application to entertainment is a matter of time and of resolving the creative and ethical debates it entails.

Beyond that, personalization of the viewing experience itself: different opening sequences depending on user history, dubbing variants that adjust tone according to detected preferences, or even AI-generated alternative endings for series activated according to the viewer profile. Black Mirror: Bandersnatch was an artistic experiment; AI personalization systems could turn it into a distribution standard.

For content creators, this raises fundamental questions about authorship and creative integrity that the sector will have to resolve: to what extent can a distributor algorithmically modify a work without compromising the author’s vision? Which version is the “real” one when hundreds of personalized variants exist? The debate over “final cut” takes on new dimensions in the age of personalized AI distribution.

Chapter 6

Generative AI: the new territory

Generative AI is what has captured the public imagination and generated the greatest volume of debate in the audiovisual sector. There are reasons for this: the ability to generate high-quality images, audio, and video raises fundamental questions about authorship, labor, value, and representation that the sector will have to answer in the coming years. This chapter separates hype from operational reality in 2026.

6.1 Video generation: the real state in 2026

In January 2024, OpenAI introduced Sora: a model capable of generating photorealistic videos of up to one minute from text instructions. The demo images showed scenes with coherent physics, complex camera movements, and visual continuity. The sector reacted with simultaneous fascination and alarm.

In 2026, Sora is available to ChatGPT users and through API access for producers and studios. Runway Gen-4, Kling 3.0, Google Veo 3, and Seedance 2.0 (ByteDance, #1 on the Artificial Analysis leaderboard as of April 2026) compete in the same space with different strengths. AI video generation is a real preproduction tool with established productive uses: detailed animatics for project pitching, concept visualization of complex scenes before defining the VFX budget, animated storyboards that allow the director to explore virtual camera movements, and low-complexity filler shot generation in postproduction.

For high-budget productions, AI-generated video has not replaced real shooting or the traditional VFX pipeline. Inconsistencies in highly complex scenes, limits on maximum clip length, and lack of precise control over actors and dialogue are real barriers. But for scaled content generation, low-budget advertising, motion graphics, architectural visualizations, and independent productions, the impact is transformative and no longer theoretical.

6.2 Synthetic actors, de-aging, and digital resurrection

The costs of de-aging and digital rejuvenation have fallen dramatically since The Irishman (2019), which cost $60 million in facial VFX alone. In 2026, tools such as Metaphysic Pro, Reface Pro, and integrated postproduction solutions from major VFX studios can produce comparable-quality results at a fraction of that cost, placing the technology within reach of mid-budget productions.

Fully AI-generated synthetic actors, with photorealistic appearance, the ability to perform in any language, and availability without scheduling or fee restrictions, are already a commercial reality in corporate video, advertising, and educational content. Companies such as Synthesia, HeyGen, and D-ID generate synthetic presenters and avatars that many companies use for internal communications, training, and content marketing.

The digital resurrection of deceased actors and public figures remains the most ethically tense territory. The holograms of Tupac (Coachella 2012) and Whitney Houston (2020 tour) established artistic and commercial precedents. In 2026, the digital image rights of deceased figures are assets actively managed by their estates and heirs, with specific contracts for each use.

6.3 Generative music for audiovisual production

Generative music has moved from experimental curiosity to standard production tool for a specific type of need: incidental music, ambient beds, news beds, brand content music, and low-budget productions that cannot afford to license commercial music or hire composers.

Tools such as Suno V4, Udio, and AIVA generate complete musical tracks in minutes adapted to genre, tempo, exact duration, and specified instrumentation. For an editor who needs a 47-second piece with strings and a melancholic tone for a documentary transition, this solves a real problem instantly. For a professional composer who needs to explore orchestral variants of a theme, AI tools accelerate the creative process, rather than serving as a direct threat.

The threat is real for library music composers, sound effects, and background music: this specific market is being deeply disrupted by automatic generation. Platforms such as Epidemic Sound and Artlist, based on licensed music libraries from human composers, are under significant competitive pressure from generative systems.

6.4 Deepfakes: the operational threat and the sector’s response

Deepfakes are not a future threat. They are a present operational reality. Tools for generating convincing face-swap videos are accessible, free (DeepFaceLab), easy to use, and produce results that deceive the human eye under normal viewing conditions. The audiovisual sector has a dual role here: as a potential victim of disinformation and manipulation, and as part of the technological ecosystem that has normalized these capabilities.

The most promising technical response is content provenance standards. C2PA (Coalition for Content Provenance and Authenticity), promoted by Adobe, Microsoft, BBC, Reuters, AP, Sony, and Canon, among others, defines a standard for digitally signing content at the moment of creation, associating cryptographically verifiable metadata about who created the content, with which tool, and when. In 2026, Adobe, Sony, Nikon, and Canon implement C2PA in their cameras and software. Adobe Content Credentials is integrated into Photoshop, Illustrator, Premiere, and Firefly.

For audiovisual professionals, understanding and adopting C2PA is not merely a technical matter: it is a matter of institutional credibility. In an environment where any video can be questioned as a deepfake, the ability to cryptographically demonstrate content authenticity is an increasingly valuable asset, especially for journalism, documentaries, and news content.

Chapter 7

AI agents — the next leap

This chapter deserves special treatment because it represents the most significant change in how AI interacts with production workflows. The previous chapters describe tools: specialized systems that do one specific task very well when activated by a human. AI agents are qualitatively different.

7.1 What an AI agent is and why it changes everything

An AI agent is a system that combines a language or multimodal model with the ability to make decisions, execute actions in the world (call APIs, search databases, run code, send messages, modify files), and chain multiple steps of reasoning and action to complete complex tasks, with minimal or no human intervention.

The difference from a conventional AI tool is fundamental. An AI tool responds: you give it an input, it returns an output. An agent plans: you give it a goal, it determines which steps it needs to take to reach it, executes them sequentially or in parallel, detects errors or unexpected results and adapts, and delivers the final result.

Analogy for the audiovisual sector

An AI tool is like a technical specialist who performs a task perfectly when you give them exactly what they need. An AI agent is like a production coordinator: it receives the objective, organizes itself, delegates to the appropriate specialists, supervises progress, resolves incidents, and delivers the result. The difference is not technical capability: it is operational autonomy.

The most advanced AI agents available in 2026 are based on models such as Claude Opus 4.7 (Anthropic), GPT-5.4 with function calling (OpenAI), and Gemini 2.0/3 with Google DeepMind. Orchestration platforms such as Claude Managed Agents (Anthropic), AutoGPT, LangGraph, and CrewAI make it possible to build complex agents without having to program from scratch.

7.2 Agent architecture: how they work

A modern AI agent has four fundamental components. The brain is the language or multimodal model that reasons about the current state of the task, decides which action to take next, and evaluates the results of previous actions. The tools are the capabilities the agent can invoke: database searches, external API calls, code execution, file reading and writing, message sending, interaction with web browsers or software interfaces.

Memory is the agent’s ability to access relevant context: the history of previous actions in the current task, external knowledge bases (documents, manuals, databases), and, in more advanced systems, persistent memory between sessions. Planning is the ability to decompose a complex objective into subtasks, determine their optimal sequence, parallelize the ones that are independent of one another, and replan when something does not work as expected.

Multi-agent systems add another layer: multiple specialized agents collaborating to complete tasks beyond the capabilities of a single agent. A coordinator agent receives the global objective and delegates subtasks to specialized agents (one for research, one for content generation, one for technical QC, one for distribution), supervises their work, and synthesizes the results.

7.3 Real applications of agents in the audiovisual sector

AI agent applications in the audiovisual sector are moving from pilot projects to operational implementations in 2026. These are the most significant ones.

Automatic localization pipelines

A multilingual localization agent receives a master video file, transcribes the audio using Whisper, translates the transcript into target languages with specialized translation models, synchronizes subtitles with the video timecode, generates dubbing using ElevenLabs or Papercup in the configured voices, performs automatic QC (lip-sync, subtitle timing, audio levels), and delivers the final files formatted for each destination platform. All of this happens without human intervention between the start and final review. What previously required a week of coordinated work from a localization team can be completed in hours.

Archive management and proactive cataloging

An archive agent does not merely catalog incoming material: it continuously monitors the MAM, identifies material relevant to incoming licensing requests, alerts teams about content with reuse potential in projects under development, and automatically generates preview clips and presentation packages for licensing requests. When a request comes in for “footage of active volcanoes in Iceland filmed between 2010 and 2020,” the agent does not search: it executes a multicriteria search, evaluates the results, checks the rights status of each clip, generates a selection reel, and prepares the licensing estimate.

Content monitoring and analysis in distribution

A distribution agent continuously monitors content performance across multiple platforms (views, completion rate, social media engagement, press mentions), compares actual performance with the launch model’s predictions, identifies anomalies (an episode with unusually high abandonment at a specific minute, a market where the content unexpectedly overperforms), and generates alerts and periodic reports with action recommendations: adjust title SEO strategy, release additional content in the best-performing market, or analyze the specific abandonment minute to understand the problem.

End-to-end assisted preproduction

The most advanced studios and platforms are exploring agents that accompany the entire project development process: they receive the initial concept, research the market for similar content, analyze the historical performance of comparable genres and formats, generate a commercial viability analysis, propose concept variants optimized for different markets or platforms, and prepare the full presentation deck. Work that previously required weeks from a development team can have a first draft in hours, leaving the human team in the role of curator and final decision-maker.

Automatic quality control in postproduction

AI-based automated QC is not new, but next-generation QC agents go far beyond detecting standard technical errors (black frames, audio dropouts, out-of-sync subtitles). An advanced QC agent analyzes the full video content, verifies narrative coherence between episodes (references to past events, character continuity), checks compliance with the editorial requirements of the destination platform (age rating, content restrictions by market, complete language versions), and generates a detailed report with detected problems, their severity, and required corrections.

7.4 Agent platforms: the ecosystem in 2026

The ecosystem of platforms for building and deploying AI agents consolidated significantly in 2025-2026. Anthropic’s Claude Managed Agents is the most advanced cloud agent platform for enterprise environments, with management, monitoring, and control capabilities for production agents. Its emphasis on safety and predictable behavior makes it especially suitable for audiovisual production environments where errors have real cost.

OpenAI Assistants API with GPT-5.4 offers a mature platform with a broad integration base. LangGraph and LangChain provide open-source frameworks for building agents and complex workflows with granular control over execution flow. CrewAI specializes in multi-agent systems where different agents with defined roles collaborate on complex tasks.

For audiovisual companies that want to adopt agents without building them from scratch, platforms such as Zapier with AI, Make.com with AI modules, and n8n offer no-code or low-code environments for building agentic workflows that connect existing tools in the production workflow.

Important consideration

AI agents are not perfect. They make mistakes, sometimes in unexpected and hard-to-predict ways. In production environments with real consequences (content delivery to clients, distribution of master files, communications with talent), it is essential to always design a human supervision layer before irreversible actions. The principle of “human in the loop” at critical points is not a limitation: it is responsible professional practice.

7.5 Agents and safety: the fundamental principle

AI agents are not perfect. They make mistakes in ways that are sometimes unexpected and difficult to predict. An agent may misinterpret an ambiguous instruction, use a tool incorrectly, or make a reasonable decision based on incomplete information that produces an unwanted result. In production environments with real consequences, this requires careful design of supervision mechanisms.

The “human in the loop” principle at critical points is not a technological limitation to overcome: it is responsible professional practice. Well-designed agent systems for the audiovisual sector explicitly distinguish between low-risk actions (searching the MAM, generating a draft, analyzing a file) that the agent can execute without confirmation, and high-risk or irreversible actions (sending communication to talent, delivering a master file to the client, modifying archive metadata) that always require explicit human confirmation.

Anthropic, in its Claude Managed Agents platform, formalizes this principle with the concept of “minimal footprint”: agents should request only the permissions strictly necessary for their task, prefer reversible actions over irreversible ones, and escalate to the human supervisor whenever there is ambiguity about the scope of their authorization. This design principle applies to any agent implementation, regardless of the platform used.

7.6 The immediate future: multimodal agents in production

The frontier of AI agent capability in 2026 lies in multimodal agents: systems that can simultaneously process and generate text, image, audio, and video, and execute complex workflows that integrate all these media. This opens possibilities that are currently in prototype phase but will become operational in the next 12-24 months for the audiovisual sector.

A multimodal content supervision agent will be able to analyze dialogue, image, and audio simultaneously in a production, identify potentially problematic content according to the editorial criteria of the destination platform, and generate a detailed report with specific timestamps and the type of issue detected, all without human intervention until final review.

A multimodal content adaptation agent will be able to take an original production, analyze its narrative structure, and automatically generate versions adapted for different platforms: a three-minute TikTok summary with subtitles and vertical framing, a 90-second Instagram clip with a generated caption, and a 30-second trailer for YouTube Shorts, all while maintaining the narrative coherence and visual identity of the original.

These capabilities are not speculation: they are direct extrapolations of what current multimodal systems can already do separately. Integration into operational production pipelines is the next step, and the time horizon is shorter than most of the sector anticipates.

7.7 How to begin: agent adoption strategy

For an audiovisual-sector company that wants to begin implementing agents, the recommendation is to move from simple to complex and from reversible to irreversible. First step: identify a high-frequency, well-defined process with repeatable steps where errors are easy to detect and correct. Good initial candidates are cataloging incoming archive files, generating periodic distribution reports, or managing standard archive licensing requests.

Second step: build the agent with a high-capability language model (Claude Opus 4.7, GPT-5.4), connect it to the necessary tools (MAM, rights database, email system), and explicitly define which actions it can take autonomously and which require human confirmation. Third step: run it in parallel with the human process during a validation period to compare results and detect errors. Only after that validation should operational responsibility be transferred to the agent.

Fourth step, and the most important: measure. Define success metrics before implementing the agent (process time, error rate, cost per transaction) and periodically evaluate the agent’s performance against those metrics. Agents require maintenance: the underlying AI models are updated, tool APIs change, and production processes evolve. An agent that is not actively maintained degrades.

The audiovisual companies obtaining the most value from AI agents in 2026 are those that have invested both in technical implementation and in training their teams to work with supervised agents. Technology is the easy part. The difficult part is redesigning processes and roles to integrate human supervision in a way that is effective without becoming a bottleneck.

Chapter 8

The 2026 toolkit arsenal

The following inventory reflects the state of the market in 2026. Tools change quickly; the criteria for evaluating them (output quality, workflow integration, cost, support, regulation) do not.

8.1 Preproduction

8.2 Production and set

8.3 Postproduction — Editing and VFX

8.4 Postproduction — Audio

8.5 Agents, MAM, and distribution

Chapter 9

How to use AI legally in audiovisual production — A practical compliance guide

This is the question most industry professionals are asking in 2026 and the one receiving the fewest clear answers: what exactly do I have to do to use AI legally in my production? When must I disclose that I used AI? How do I disclose it? What happens if I do not? This chapter answers those questions concretely, directly, and organized by production phase.

European AI regulation is real, it is in force, and it has concrete economic consequences. But it is not as complicated to apply as it may seem if its basic logic is understood: the main obligation is not to prohibit the use of AI, but to ensure that people who interact with AI-generated or AI-manipulated content know they are doing so when that is relevant to their rights or decisions.

9.1 The complete legal framework: which rules apply to the audiovisual sector

The EU AI Act

The AI Act entered into force on August 1, 2024 and applies in phases until August 2, 2027. It is the world’s first comprehensive AI legislation. It applies to any company that places AI systems on the EU market, uses AI systems in its processes, or distributes audiovisual content in the EU, regardless of where the company is domiciled.

Its logic is that of a risk pyramid. At the top are prohibited practices that no AI system may perform. Below them are high-risk applications that require a formal conformity assessment process before deployment. Then come limited-risk applications that require only transparency. And at the base are most everyday applications, which are minimal-risk and require no specific obligations beyond general best practices.

The Digital Services Act (DSA)

The Digital Services Act complements the AI Act for digital platforms with more than 45 million monthly active users in the EU (Netflix, YouTube, TikTok, Instagram, etc.). These platforms have additional obligations around algorithmic transparency: they must explain how their recommendation systems work, offer users the possibility of receiving recommendations not based on profiling, and publish periodic transparency reports. For producers and distributors working with these platforms, understanding DSA obligations is relevant because it affects how content is presented and distributed.

The General Data Protection Regulation (GDPR)

The GDPR predates the AI Act but remains applicable and complementary. It is especially relevant when AI processes biometric data: facial images of actors, identifiable voices of individuals, audience behavior patterns. Biometric data is special category data under the GDPR and requires explicit consent for processing. This applies directly to the use of facial recognition in casting, the training of models with actors’ voices without their consent, and biometric audience analysis.

9.2 The four concrete obligations for the audiovisual sector

Obligation 1: Label synthetic content that imitates real people

This is the most relevant obligation for day-to-day audiovisual production. The AI Act establishes that operators of AI systems that generate or manipulate audiovisual content in a way that may mislead people about its authenticity must ensure that the content is marked in a way that indicates it has been generated or manipulated by AI. The obligation specifically affects: deepfakes of real people (video in which a real person appears to say or do something that did not happen), synthetic voices of identifiable people (a narrator whose voice has been cloned by AI), photorealistic images of real people generated by AI, and video or audio that imitates the style of a real artist in a way that could be confused with authentic content.

What content is exempt from this obligation? Content that clearly does not intend to be real: animations, caricatures, obvious visual effects, and satirical or artistic content where the use of AI is contextually obvious. Also AI-generated content that does not involve representations of real people and cannot be confused with authentic content.

Obligation 2: Transparency in recommendation systems

Platforms with more than 45 million users in the EU (under the DSA) must inform users that they are using algorithmic recommendation systems and offer at least one alternative not based on profiling. For producers and distributors, this does not create direct obligations, but it does have indirect implications: platforms must be able to explain why they recommend content, which means that the metadata and signals surrounding a piece of content are more important than ever.

Obligation 3: Absolute prohibitions

There is a set of practices that the AI Act completely prohibits, regardless of sector or use. Those relevant to audiovisual work are: social scoring systems for citizens by public authorities (not directly affecting the audiovisual sector), subliminal manipulation of human behavior through AI without user awareness (affecting very aggressive programmatic advertising techniques), and real-time biometric recognition in public spaces for surveillance purposes (with exceptions for very specific entertainment applications).

Obligation 4: Conformity assessment for high-risk systems

AI systems classified as “high-risk” require a formal conformity assessment process before being placed on the market. For the audiovisual sector, relevant high-risk systems are those used to make decisions affecting employment or hiring (including casting systems that automatically filter candidates), AI systems that evaluate access to financial services (such as those assessing a project’s risk for investors), and AI systems in critical infrastructure. Conformity assessment requires technical documentation, registration in the EU database, and, in some cases, third-party auditing.

9.3 How to disclose AI use: practical guide by content type

The most practical question: how do I label AI-generated or AI-assisted content? The answer depends on content type and distribution context. There is no single mandatory format for all cases, but there are clear principles and some technical implementations that are becoming standard.

For news and documentary content

When AI is used to generate images, video, or audio presented in an informational context (news, documentaries, reports), the label must be visible, legible, and present from the first moment of viewing. The most commonly used formats are: text overlay in the video itself (“AI-generated image,” “Synthetic voice,” “Recreation generated by artificial intelligence”), explicit mention in the report or documentary introduction, and a note in the program or film credits specifying which elements were generated or modified by AI.

The position recommended by the EBU (European Broadcasting Union) in its 2025 guide: labeling should appear at the moment when AI-generated content is on screen, not only in the end credits. For a 10-second insert with AI-generated images in a news broadcast, the overlay should be present for those 10 seconds.

For entertainment and fiction content

In fiction, the labeling obligation applies mainly when the content involves representations of real people that could be confused with real statements or actions by those people. A film with entirely fictional synthetic actors does not require specific AI labeling. A film that uses the digital image of a deceased actor or a deepfake of a public figure does require clear disclosure.

The emerging standard in the entertainment sector is to include an explicit mention in the opening or end credits: “This production uses artificial intelligence-generated image synthesis technology” or “Certain sequences in this production have been generated or modified using artificial intelligence.” Several streaming platforms, including Netflix and Amazon, are developing their own labeling standards that producers will have to meet in order to distribute on their platforms.

For advertising and branded content

Advertising content that uses AI to generate representations of real people or to create product testimonials or recommendations using synthetic voices or images of real people requires labeling. The EASA (European Advertising Standards Alliance) is developing specific guidelines for AI advertising that complement the AI Act. In Spain, Autocontrol has yet to publish its own guidelines on the matter.

For advertising with synthetic people who do not represent real individuals (avatars, AI-created characters), the best-practice standard is to indicate that the character is AI-generated, although the strict legal obligation is less clear in this case.

For content on social media and digital platforms

TikTok, Instagram, and YouTube already have their own labeling requirements for AI-generated content, independent of the AI Act but consistent with it. In 2026, these platforms require creators to disclose when their content includes images, video, or audio generated in a “realistic” way by AI, especially when it involves real people or real events. Noncompliance may result in content removal or account suspension.

9.4 The technical standard: C2PA and Content Credentials

Beyond visual labels, there is a technical standard becoming the infrastructure of AI transparency in audiovisual content: C2PA (Coalition for Content Provenance and Authenticity). Understanding how it works is relevant to any professional in the sector.

What C2PA is

C2PA is an open standard that allows any digital file (image, video, audio) to carry a cryptographically signed manifest recording: who created the content and with which tool, what transformations or edits have been made since original creation, whether AI was used at any stage of creation or editing, and when each action occurred. This manifest is embedded in the file’s metadata and can be verified with publicly available tools.

The key is that the manifest is cryptographically signed: it cannot be modified without invalidating the signature, making it impossible to retroactively alter a file’s history. If the manifest says that the file was created with a Sony A7C II camera at a specific place and time and was not edited with AI, that record is verifiable and trustworthy.

Who implements C2PA in 2026

C2PA adoption is advancing rapidly. Adobe implements Content Credentials (its C2PA implementation) in Photoshop, Illustrator, Premiere Pro, After Effects, and Firefly: any content edited or generated with these tools can carry Content Credentials. Sony implements C2PA in its Alpha camera line. Nikon and Canon are in the process of implementation. Leica has had full support since 2024. Microsoft implements C2PA in Designer and Bing Image Creator. Getty Images and Shutterstock add C2PA to their stock content.

On the verification side, major distribution platforms (YouTube, LinkedIn, Facebook, X) are developing support to display Content Credentials when available. The goal is for viewers to be able to click an icon next to the content and see its full creation history, including whether AI was used.

How to implement C2PA in your production workflow

For a production company that wants to adopt C2PA in its workflow, the practical steps are: first, use tools that already implement C2PA (Adobe tools, Sony/Leica cameras). Second, activate the Content Credentials function in Adobe Creative Cloud (it is in the preferences of each application). Third, when exporting the final file, verify that the C2PA manifest is included using the free tool at verify.contentauthenticity.org. Fourth, include in the production archive a record of which AI tools were used in which parts, so the legal team can access this information if necessary.

9.5 Obligations by production phase — Reference table

The following table summarizes which transparency obligations apply in each phase of an audiovisual production. It is a practical guide, not a legal opinion: specific application depends on the type of production, distribution market, and tools used.

PREPRODUCTION

Use of AI to generate visual references or storyboards: no labeling obligation if not distributed publicly. Use of AI for script analysis or audience prediction: no obligation. Use of AI for casting preselection with facial image analysis: classified as a high-risk system; requires conformity assessment and cannot be used as the sole basis for decision-making.

PRODUCTION

Virtual production with AI-generated environments: no labeling obligation in fiction if there is no representation of real people that can be confused with reality. Automated AI cameras: no specific obligation. Recording actors for training AI models: requires explicit actor consent under the GDPR and specific contracts.

POSTPRODUCTION

AI-assisted editing (rushes selection, color correction): no labeling obligation. AI-generated VFX that do not involve real people: no labeling obligation. Digital de-aging or rejuvenation of actors: no labeling obligation if the actor has given consent; mention in credits is recommended. Deepfake of a real person: visible labeling obligation at all times the person appears on screen. Voice cloning of a real person: audible or visual labeling obligation. AI archive restoration: no specific obligation; best practices recommend a note in the credits.

DISTRIBUTION

Distribution on platforms with more than 45M EU users: compliance with the platform’s own AI labeling requirements. Broadcast distribution: compliance with national audiovisual communication regulation, which in Spain already includes transparency requirements on AI use. Advertising distribution: compliance with digital advertising regulations and EASA guidelines.

9.6 Fines and compliance timeline

The AI Act establishes fines depending on the severity of the infringement and the type of obligation breached. Fines for prohibited practices (the highest-risk ones) may reach €35 million or 7% of annual global turnover, whichever is higher. Fines for noncompliance with high-risk system obligations may reach €15 million or 3% of global turnover. Fines for providing incorrect information to authorities may reach €7.5 million or 1.5% of turnover. In all cases, the higher amount between the absolute figure and the percentage of turnover applies.

The phased implementation calendar is as follows. Since February 2, 2025, unacceptable-risk AI practices have been prohibited (subliminal manipulation, social scoring, unauthorized real-time biometric recognition). Since August 2, 2025, obligations for general-purpose AI models have applied (large language and image models have their own transparency obligations). From August 2, 2026, obligations for high-risk systems, including casting and hiring systems, become fully applicable. And from August 2, 2027, all remaining obligations of the Regulation apply.

For audiovisual-sector companies, the practical priority in 2026 is: ensure that content containing deepfakes of real people is properly labeled (already applicable), verify that AI systems used in casting are not used as the sole basis for decision-making without human supervision (high-risk system, fully applicable from August 2026), and prepare conformity documentation if they use or develop high-risk AI systems.

9.7 Frequently asked questions from the sector

Do I have to disclose that I used AI to color grade my film?

No. The AI Act does not require labeling AI use in technical postproduction processes that do not affect the representation of real people or create content that could be confused with authentic reality. Color correction with DaVinci Resolve AI, upscaling with Topaz, audio cleanup with iZotope RX, or assisted editing with Adobe Sensei do not generate specific labeling obligations under the AI Act.

Should I disclose that the background music was generated by AI?

If the music is purely instrumental and does not imitate the style of any identifiable artist in a way that could be confused with their work, there is no labeling obligation under the AI Act. Distribution platforms may have their own policies on this: Spotify, for example, is developing policies on AI-generated music that distributors should consult.

I used ElevenLabs to dub my documentary into English. Do I have to disclose it?

It depends on how the dubbing was done. If you cloned the voices of the interviewees themselves for the dubbing (that is, the English voice sounds like each person’s original voice), there is a synthetic representation of identifiable people that requires labeling and, beforehand, explicit consent from the interviewees under the GDPR. If you used generic ElevenLabs voices that do not correspond to any identifiable person, there is no labeling obligation under the AI Act (although mentioning it in the credits is good practice).

I generate images with Midjourney for transitions in my documentary. Should I label them?

If the images are clearly abstract, illustrative, or stylized, and do not represent real people or real events in a realistic way, there is no labeling obligation. If you use AI-generated images that look like documentary photographs of real places or situations in the context of a documentary, then it is mandatory to disclose this so as not to mislead the viewer about the nature of the material.

Does the AI Act apply to my company if I am outside the EU?

Yes, if you distribute content in the EU or your content is accessible to EU users. The AI Act follows the GDPR model in its extraterritorial application: what matters is the market where the effect occurs, not where the company is domiciled. A Latin American production company distributing on Netflix Europe is subject to the AI Act obligations for that content.

What happens if I do not label a deepfake?

National supervisory authorities (in Spain, the Spanish Artificial Intelligence Supervisory Agency, AESIA, created in 2024) may impose the fines provided for in the AI Act. In addition, the person represented in the deepfake may bring civil claims for damage to their image and reputation under the right to honor and personal image. And distribution platforms may remove the content and suspend the distributor’s account.

Chapter 10

The audiovisual professional in the AI era

Conversations about the impact of AI on work tend to swing between two equally useless narratives: the apocalyptic one (AI will destroy all creative jobs) and the uncritical optimistic one (AI will simply free creatives from boring tasks so they can focus on interesting ones). Reality is more nuanced, more specific by role and profile, and more urgent than either narrative acknowledges.

10.1 What changes and what does not

What certainly changes: the time required for repetitive technical tasks. Transcription, subtitling, preliminary rushes selection, basic audio correction, generation of low-complexity creative variants, archive search, generation of standard production documents, metadata classification. In these areas, productivity multiplies by factors of 5 to 20. Anyone who does not adopt these tools will compete at a structural cost and time disadvantage against those who do.

What does not change: narrative judgment about which story deserves to be told and how. Aesthetic judgment about whether a frame, a color palette, or a soundtrack serves the story or betrays it. The ability to connect emotionally with an audience through content. Understanding of the cultural, political, and social context in which content operates and to which it responds. Editorial responsibility for what is published and how it is represented. Managing human relationships in a production process involving dozens or hundreds of people. Decision-making under creative uncertainty, with insufficient data and time pressure.

What evolves (and this is the most important point): the professional’s role shifts from executor to director. An editor who used to spend 70% of their time on technical tasks and 30% on creative decisions can now reverse that proportion. A composer who previously built every musical element from scratch can generate 30 variants in minutes and spend their time on selection, refinement, and artistic direction. A localization coordinator who previously managed each delivery manually can supervise automatic pipelines that process ten languages in parallel. AI does not eliminate work; it radically redistributes where value resides.

10.2 Who is at greater risk and who is at lower risk

An honest analysis of employment impact requires distinguishing by type of role, level of specialization, and type of task. The roles most vulnerable to automation in the medium term are those combining high technical specialization with well-defined and repeatable tasks: manual subtitling and transcription operators, basic QC technicians (standard technical error analysis), entry-level color correction operators, standard mastering technicians, and producers of very low-complexity social media content.

The most resilient roles are those combining creative judgment with human context management: directors and cinematographers, screenwriters and showrunners, composers for high-value productions, casting directors, senior editors with narrative judgment, executive producers, and programming directors. These roles will benefit from AI as a capacity multiplier, not as a substitute.

Intermediate roles, which today combine technical tasks with creative tasks, are in a position of transformation: mid-level editors, sound technicians with creative responsibilities, and intermediate-level VFX artists. For these profiles, the adaptation strategy is clear: actively shift the proportion of work toward the capabilities AI cannot replicate.

10.3 Skills to develop now

Prompt engineering for creatives: knowing how to formulate precise, layered, and contextualized instructions for language and image models is a transversal skill that improves with deliberate practice. It is not programming and does not require technical AI knowledge: it is communication with a new kind of interlocutor. The difference between a mediocre prompt and an excellent one can mean the difference between an output that must be redone and one that is immediately useful.

Critical evaluation of AI outputs: knowing how to identify when an AI output is useful, when it needs substantial refinement, when it is fundamentally incorrect in non-obvious ways (hallucinated facts, metadata errors, narrative coherence inconsistencies), and when using the tool is creating more work than it saves requires professional judgment. This critical evaluation capability is more valuable than the ability to use the tool, and it depends on domain knowledge, not AI knowledge.

Hybrid workflow design: designing production processes that integrate AI at the points where it provides the most value, with human supervision mechanisms at the points where errors have real cost, and without creating fragile dependencies on tools that may change in price, functionality, or availability. AI as production infrastructure requires risk management similar to any other critical service provider.

Regulatory and ethical literacy: understanding the AI Act, the contractual implications of using AI with talent, copyright limitations in AI-generated content, and risks of bias and representation is not only the legal department’s responsibility. Creators, producers, and executives who make decisions about AI use in production need to understand the consequences of those decisions.

10.4 How to position yourself as a company or team

The audiovisual companies gaining real competitive advantage with AI in 2026 are not necessarily the ones with the most resources or the greatest enthusiasm for technology. They are the ones that have made strategic decisions about where to integrate AI and have built coherent, sustainable workflows.

Three areas offer immediate and measurable return for most companies in the sector. First, cataloging and increasing the value of owned archive assets: if uncataloged material exists, implementing a MAM solution with AI is probably the investment with the best short-term ROI. Implementation cost is recovered with the first archive licensing contract made possible by the new system. Second, workflow automation in postproduction for low-cost, high-frequency content: social media, corporate content, multilingual versions, archive reclips. Human teams focus on high-value content. Third, the use of AI agents for well-defined, high-frequency processes: localization, QC, distribution reports, licensing request management.

AI is neither a magic solution nor an existential risk for the sector. It is an infrastructure technology that, when integrated with strategy and judgment, amplifies human team capabilities and creates real competitive advantages. The question is not whether to adopt it; it is when and how to do so in a way that creates sustainable value.

Chapter 11

Case study — End-to-end documentary production with AI

This chapter presents a complete practical case based on a real workflow for the production of a 52-minute documentary titled “Invisible Borders,” an exploration of migrant communities in Europe. The case integrates AI tools across all phases of production, showing where they provide real value and where the limits are equally important.

Phase 1: Development and preproduction (6 weeks)

The process begins with the commission: a European streaming platform wants a documentary about migration for its editorial content catalog. The development team has six weeks to deliver a complete pitch.

Weeks 1-2: Research and concept development. The producer uses Claude to analyze the landscape of documentaries about migration on European platforms over the last five years: which narrative angles have been explored, which have received the best reception, which perspectives are underrepresented. This analysis, which would manually require two weeks of research, is completed in an afternoon. The results reveal that most documentaries adopt an external observer perspective; there is a shortage of content narrated in the first person by migrant communities themselves.

With this insight, the team decides to focus the documentary on five first-person stories from migrants of different origins in five European cities. Claude helps develop the narrative structure, identify potential points of dramatic tension in each story, and generate a first version of the documentary bible. The team spends two days refining and rewriting in its own voice what the AI generated in hours.

Week 3: Visualization and presentation. The director generates with Midjourney v8 a detailed visual moodboard for each of the five stories: color palettes, photography styles, composition references. In one day, they have 40 reference images that would normally take a week to assemble. With Sora, they generate three 30-second animatics showing the documentary’s visual intent for the pitch. The result is a visual presentation that communicates the project with a clarity that surprises the platform.

Weeks 4-6: Production planning. With the project approved, Movie Magic Scheduling analyzes the preproduction script and generates an 18-day shooting plan optimized to minimize international travel. GreenShoots AI generates a first budget estimate. The team adjusts both documents in two days instead of the usual week.

Phase 2: Production (18 days, 5 countries)

At each location, the team uses Whisper in real time to transcribe interviews in the original languages: Arabic, Tigrinya, Wolof, Ukrainian, Spanish, with simultaneous translation for the director. This allows follow-up questions based on interviewees’ answers in real time, something that with a human interpreter would have required a significantly larger budget.

Silverstack XT manages each shooting day’s files, automatically organizing the material according to the script structure, generating proxies for review, and creating backups in three destinations simultaneously. The metadata for each clip is available to the editor from the first day of postproduction.

Phase 3: Postproduction (6 weeks, 140 hours of footage)

Weeks 1-2: Material selection. Adobe Premiere Pro automatically transcribes the 140 hours of footage. The editor can search by text for any specific moment: “when María talks about her mother” or “scene in the market.” Adobe Sensei analyzes the technical quality of each shot and marks the best candidates. The selection process that would have required three weeks is completed in ten days.

Weeks 2-3: Editing. The editor assembles the first cut. Creative decisions are entirely human. However, DaVinci Resolve automates technical tasks: Magic Mask isolates interviewees for differentiated color adjustment, IntelliTrack stabilizes shaky shots, and continuity analysis alerts the team to inconsistencies.

Weeks 3-4: Color, audio, and localization. The colorist uses Colourlab AI to establish matching between shots from different cameras in five countries with very different lighting conditions. iZotope RX 11 cleans interview audio recorded in field conditions. ElevenLabs generates English dubbing for stories in minority languages, with voices that preserve each interviewee’s character. Whisper generates subtitles in seven languages for European distribution.

Weeks 4-5: Music and mix. The composer uses AIVA as a starting point to explore musical motifs, generating 30 variants in one day. The director selects, and the composer refines. The result is not generative music: it is human music accelerated by the AI exploration phase. LANDR ensures that the final mix meets the loudness standards of the destination platform.

Week 6: QC and delivery. An automatic QC agent analyzes the complete master file: all subtitle languages, audio levels, content ratings by territory, and credit verification against contracts. Three minor problems corrected in one day. Delivered on time.

Impact analysis

Compared with an equivalent workflow without AI tools: preproduction time was reduced by 40%; rushes selection time fell by 65%; localization was completed in a fraction of the usual time and cost; and total production cost was approximately 25% lower.

What did not change: the quality of the stories depended entirely on the director’s ability to build trust with the interviewees. The narrative cohesion of the edit was the result of the editor’s creative judgment. Cinematography required a human operator with sensitivity to each cultural context. And editorial responsibility for how real people in vulnerable situations were represented remained irreducibly human.

This balance, between AI-amplified technical efficiency and irreplaceably human creative and ethical judgment, is the production model defining the professional audiovisual sector in 2026.

Glossary

Essential terms for the audiovisual professional

This glossary gathers the most relevant technical terms in the AI ecosystem for audiovisual work. It is not intended to be exhaustive: it is intended to be useful.

Models and architectures

Transformer

Neural network architecture introduced in 2017 that uses an “attention” mechanism to process data sequences. It is the foundation of all current large language models (GPT, Claude, Gemini) and of the leading image and video generation models.

LLM (Large Language Model)

Large-scale language model trained on enormous text corpora. Examples: GPT-5.4, Claude Opus 4.7, Gemini 2.0/3. Capable of generating coherent text, reasoning, analyzing documents, and maintaining complex conversations.

Multimodal model

AI model capable of processing and generating multiple types of data simultaneously: text, image, audio, and video. Multimodal models make it possible, for example, to analyze a video clip and generate directing notes in text in a single operation.

GAN (Generative Adversarial Network)

Generative network architecture that uses two competing networks (a generator and a discriminator) to produce realistic outputs. It was the foundation of the first high-quality deepfakes. In 2026, largely surpassed by diffusion models for image and video generation.

Diffusion model

Generative architecture that works by learning to reverse a process of adding random noise to an image. Midjourney v8, DALL-E 3.5, Stable Diffusion, and most high-quality image and video generation systems in 2026 are based on this architecture.

Agents and systems

AI agent

System that combines a language model with the ability to execute actions in the world (API calls, file modification, searches, message sending) to complete complex tasks autonomously or semi-autonomously, without requiring a human to direct each step.

Human in the loop

Agentic system design in which the agent executes tasks autonomously until it reaches high-risk or low-confidence decisions, at which point it pauses and requests human confirmation before continuing. The recommended standard for agent implementations in production environments.

Prompt / Prompt engineering

The prompt is the instruction or set of instructions given to an AI model. Prompt engineering is the discipline of designing effective prompts that consistently obtain the desired outputs. It is an increasingly valued skill across all roles in the audiovisual sector.

RAG (Retrieval-Augmented Generation)

Technique that combines generation by a language model with information retrieval from an external knowledge base. It allows an AI agent to answer questions based on specific documents (contracts, manuals, archive databases) without needing to retrain the model.

Production and distribution

MAM (Media Asset Management)

Multimedia asset management system. It stores, organizes, indexes, and facilitates access to an audiovisual organization’s files. New-generation MAMs integrate AI for automatic cataloging, semantic search, and content analysis.

C2PA (Coalition for Content Provenance and Authenticity)

Open standard for cryptographic signing of digital content at the moment of creation, making it possible to verify content authenticity and detect manipulation. Implemented by Adobe, Sony, Nikon, Canon, BBC, Reuters, and others in their tools and cameras.

Completion rate

Streaming platform metric that measures the percentage of viewers who complete a piece of content from beginning to end. It is the main metric recommendation algorithms use to evaluate a content’s value.

Content Credentials

Adobe’s implementation of the C2PA standard. Cryptographically signed metadata that accompanies a file (image, video, audio) and records its creation and editing history, including whether AI was used at any stage of the process.

Deepfake

Audiovisual content (usually video) generated or manipulated with AI to make a person appear to say or do something that did not happen. High-quality deepfakes are visually indistinguishable from authentic video to the naked eye and represent a growing risk to the credibility of informational content and to the reputation of public figures.

Chapter 13

AI in live production and broadcasting

Live production is perhaps the most demanding environment for any technology: there is no possibility of retake, response times are measured in milliseconds, and errors are immediately visible to the audience. For decades, this limited the adoption of experimental technologies in live broadcast. AI has changed this equation: in 2026, the most robust AI systems in the sector are precisely in live production, where the need to automate repetitive tasks at high speed is most critical.

13.1 Automated production of sports events

AI-powered automated production systems for sports events are now a mature commercial reality. Cinfo Tiivii, Pixellot, ChyronHego, and proprietary systems from companies such as ESPN and Sky Sports make it possible to produce complete broadcasts of soccer, basketball, tennis, track and field, and other sports without human camera operators for tracking and cutting tasks.

The typical architecture of these systems combines multiple fixed cameras with real-time object tracking AI that detects the ball (or the relevant object in the sport) and the players, a decision system that selects the most relevant shot at each moment according to configurable production rules, and an automatic graphics system that overlays statistics, scoreboards, and player identification without human intervention.

The economic result is significant: a third-division soccer match broadcast that previously required a five-person team (three camera operators, a director, and a graphics technician) can be produced with a single technical supervisor monitoring the system. This has enabled grassroots sports leagues across Europe to produce and distribute streaming content that was previously economically unviable.

The quality of automatic production at high-level events remains inferior to that of an expert human director: systems struggle with the most chaotic plays, tend to lose the narrative context of a match (the play that just happened, accumulated tension), and produce a more mechanical production style than an experienced professional. For top-tier leagues, AI as an assistant (suggesting cuts, automating secondary cameras) is more appropriate than AI as the main director.

13.2 Real-time subtitling and translation

Real-time subtitling for live broadcasts has for decades been one of the great challenges of accessible broadcasting: the combination of speed, accuracy, and cost meant that many smaller broadcasters could not offer live subtitles. AI-based speech recognition systems have completely transformed this equation.

Systems such as ENCO Deco, Verbit Live, and Microsoft Azure Cognitive Services live captioning modules produce real-time subtitles with accuracy that under ideal conditions (native speaker, high-quality microphone, clear speech) exceeds 97%. For non-standard accents, background noise, or fast speech, accuracy drops, but it remains superior to assisted stenography systems from a decade ago.

Real-time machine translation adds another layer: systems such as DeepL Live and AWS Transcribe translation modules can generate simultaneous subtitles in multiple languages with 2-3 second latency, viable for delayed broadcasts but still challenging for sports events or press conferences where immediacy is critical. In 2026, the acceptable latency threshold for high-quality real-time translation is dropping month by month.

13.3 News production with AI

Television newsrooms are one of the environments where AI has had the fastest transformative impact over the past two years. The pressure to produce quality news content in 24-hour news cycles with reduced teams has accelerated the adoption of AI tools in the editorial workflow.

AI-powered source monitoring systems, such as those from Dataminr and Signal AI, analyze thousands of sources in real time (social media, agencies, local media, public databases) and automatically identify stories with news potential, alerting editors with context and related sources. This does not automate editorial judgment, but it allows a human editor to stay aware of a volume of information that would be impossible to process manually.

Automatic generation of standard news pieces (sports results, stock reports, weather summaries, election results) is now common practice at major news agencies: AP, Bloomberg, and Reuters automatically generate thousands of these stories every day. Human journalists supervise the systems and focus on investigation, analysis, and stories requiring editorial judgment and primary sources.

For television news, systems such as Vizrt Viz Arc and Ross Video Inception automate studio production: graphics transitions, opening and closing pieces, credit removals, all executed by the system according to the news rundown updated in real time. The human director supervises and intervenes in unplanned situations.

13.4 Live streaming for independent creators

The democratization of AI live streaming production affects not only major broadcasters. In 2026, individual creators and small production companies have access to tools that were unthinkable five years ago. StreamElements, Restream, and AI features in OBS Studio make it possible to produce live streams with dynamic graphics, automatic comment moderation, automatic clips of the best moments, and real-time audience analysis.

Captions AI and Krisp remove background noise and improve voice quality in real time during streams, with on-device processing and no noticeable latency. For creators producing from home environments or with modest equipment, the technical quality improvement provided by these tools is transformative.

Chapter 14

The economics of AI in the audiovisual sector — ROI, costs, and new business models

The conversation about AI in the audiovisual sector is usually either highly technical (what tools exist) or highly philosophical (what it means for human creativity). There is a third conversation, the one happening in boardrooms, that determines what gets adopted and what does not: the economic conversation. How much does it cost to implement AI? What return does it generate? Where are the new business models? Who is making real money with AI in audiovisual, and how?

14.1 The real cost-benefit analysis

The promise of AI in the audiovisual sector can be expressed very concretely: reduction of time and cost in phases intensive in repetitive work, and access to capabilities that were previously economically inaccessible to most production companies. But implementation has its own costs that must be considered before calculating ROI.

On the cost side, one must consider: tool licenses (ranging from free tools like DaVinci Resolve to several thousand euros annually for enterprise MAM platforms with AI), team learning and training time (typically 2-6 weeks of reduced productivity per new tool adopted), workflow restructuring (process redesign cost, often underestimated), and infrastructure costs when processing large volumes of material (cloud GPUs, bandwidth, storage).

On the benefit side, sector data consistently points to these figures: 40-65% reduction in time spent on repetitive technical postproduction tasks; 25-40% reduction in multilingual localization costs; 30-50% reduction in archive cataloging time; and 20-35% reduction in total preproduction time when AI is integrated into project development.

The fastest and most measurable ROI is in three specific areas: cataloging existing archives (the sleeping asset that becomes a source of revenue), automating localization for international distribution, and automating workflows that repurpose long-form content into short-form content for social media.

14.2 ROI case: the archive as an asset

Imagine a Spanish regional broadcaster with 30,000 hours of poorly cataloged or uncataloged audiovisual archive. The potential value of that archive in the licensing market is difficult to calculate without access to the material, but an archive of that size can easily generate between €200,000 and €1,500,000 annually in licensing revenue if properly cataloged and commercialized.

The cost of implementing AI cataloging for 30,000 hours (including platform, implementation, and archive team training) is usually in the €80,000-€200,000 range, with recurring annual maintenance costs of €20,000-€50,000. The investment is recovered within one to three years, depending on the market’s licensing rate. After that, the archive generates recurring net income that was previously impossible.

This is the calculation broadcasters such as RTVE, BBC Studios, and Italy’s RAI have made to justify their investments in AI archive platforms. It is not a technical or creative argument: it is a purely economic argument.

14.3 New business models enabled by AI

Micro-productions at scale

The reduction in production costs with AI is enabling a new market segment: personalized or niche audiovisual content that was previously not economically viable. A production company that used to need a 10-person team and a €50,000 budget to produce a 30-minute documentary can now produce it with a team of 3 and €15,000, assuming the same final quality. This opens content markets for highly specific audiences that mainstream streaming platforms do not serve: local history, minority cultures, specialized professional topics.

On-demand and personalized content

The combination of generative AI, OTT distribution, and advanced recommendation systems is creating the possibility of genuinely personalized content at scale. Not just “we recommend this to you,” but “we generate this specifically for you.” The first commercial implementations are in advertising (ads dynamically generated with the user’s name, city, and preferences) and educational content (courses that adapt content, pace, and examples to the student profile). Large-scale personalized entertainment lies on a 2-4 year horizon.

AI services for the sector

Demand for AI implementation in the audiovisual sector is creating a new B2B services market: consultancies specialized in AI adoption for production, MAM system integrators with AI, trainers specialized in AI tools for creatives, and developers of customized workflows. For companies in the sector with technical experience, offering these services to other producers or broadcasters can become a new business line with attractive margins.

Monetization of production data

The metadata generated by AI systems during production (audience analysis, engagement patterns, content performance data) is an increasingly valuable asset. The most advanced studios and platforms are developing proprietary data analysis capabilities that can be monetized: trend reports for brands, market data for investors, or performance analysis for co-producers and international distributors.

14.4 The economic impact on the value chain

AI adoption does not redistribute value evenly across the audiovisual production chain. There are clear winners and losers, and understanding them is important for strategic positioning.

The segments that capture the most value with AI are those with access to large volumes of proprietary data (streaming platforms with millions of users, broadcasters with decades of archive, studios with extensive catalogs), investment capacity to implement the best tools and attract technical talent, and direct-to-consumer distribution that allows them to capture the benefits of the recommendation algorithm. Netflix, Disney+, Amazon, and the major European public broadcasters are in a structurally advantaged position.

The segments in the most difficult position are intermediaries whose value came mainly from coordination and information management (some mid-level executive production roles, casting agencies without differential creative judgment, distributors without a direct relationship with the end consumer), and specialists in repetitive technical tasks that AI directly automates.

The segments in a position of opportunity, if they act quickly, are creative independent producers that can use AI to compete on cost with major studios while maintaining their creative and agile differential, and technical specialists who retrain as AI workflow architects and implementation advisors.

Chapter 15

Global perspective — How the U.S., Asia, and Latin America approach audiovisual AI

The audiovisual sector is global, but AI adoption is not uniform. Different markets have economic dynamics, regulatory frameworks, cultural traditions, and levels of technological maturity that produce very different approaches to AI. Understanding these differences is relevant for any professional working in or with international markets.

15.1 United States: speed, scale, and litigation

The American audiovisual market is the world’s most advanced laboratory for AI adoption for one simple reason: scale. Major Hollywood studios, streaming platforms with hundreds of millions of global subscribers, and tech companies with unlimited access to compute and data have been able to invest in AI at a scale that no European or Latin American market can match.

Major American studios (Disney, Warner Bros. Discovery, Universal, Paramount, Sony Pictures) have integrated AI into project development, previsualization, VFX, localization, and audience analysis processes. Platforms (Netflix, Amazon, Apple TV+, Max) have AI teams of dozens or hundreds of engineers dedicated specifically to recommendation systems and content analysis.

However, the American market is also at the epicenter of the most intense labor and legal conflicts around AI. The 2023 SAG-AFTRA strike, the WGA writers’ strike that same year, and the wave of litigation over model training with protected content are American phenomena establishing global precedents. The result is a market that advances quickly in technical adoption but within a legal and contractual framework being rewritten in real time.

For European and Latin American companies co-producing or distributing with American partners, it is important to understand that SAG-AFTRA and WGA contracts contain specific clauses on AI use that apply to productions made under those agreements, regardless of where they are produced. A European co-production with an American studio may be subject to these clauses.

15.2 Asia: China, South Korea, and Japan — three different approaches

China: state integration and unprecedented scale

China is the market where audiovisual AI is advancing fastest in terms of integration into mass content production. Companies such as ByteDance (TikTok/Douyin), Tencent Video, iQiyi, and Alibaba Pictures have AI capabilities that in many respects surpass those of their American equivalents, with the added advantage of access to audience data from 1.4 billion people without the regulatory restrictions of the European GDPR.

The Chinese model of audiovisual AI is characterized by vertical integration (the same companies control production, distribution, and platform), massive production volume of short-form content (Douyin produces thousands of hours of new content every hour), and state supervision of content, which conditions the AI models used.

For the European and Latin American markets, China is relevant as a technology provider (many intelligent cameras and AI postproduction systems reaching the global market originate in China), as a potential distribution market (although with significant entry barriers), and as a source of learning about the scaled adoption of AI in content production.

South Korea: K-content and AI as an export advantage

South Korea is the most interesting case of the strategic use of AI to amplify the global reach of a national audiovisual sector. The Korean Wave (Hallyu), driven by the global success of K-dramas, K-pop, and Korean cinema (Parasite, Squid Game), has made South Korea the world’s second-largest exporter of audiovisual content after the U.S.

Major Korean production companies and platforms (CJ ENM, JTBC Studios, Kakao Entertainment, Webtoon Entertainment) are systematically using AI for ultra-fast localization of content into multiple languages (K-dramas premiere simultaneously in more than 50 countries with subtitles in all languages on the same day), for global trend analysis that informs new project development, and for the generation of derivative content (clips, trailers, behind-the-scenes) optimized for each distribution platform.

Japan: anime, tradition, and cautious adoption

Japan has an audiovisual industry with unique characteristics: anime is the most globally consumed animation genre, with annual production of more than 200 series and 50 films. However, the anime industry has a tradition of artisanal production with a studio structure that has historically resisted automation.

In 2025-2026, the position has changed. Anime studios such as Khara (Evangelion), Production I.G, and Wit Studio are actively exploring the use of AI for in-between work (the intermediate frames between key character positions, which represent most of the animator’s manual labor), automated clean-up, and coloring. Animator resistance to AI remains significant, but economic pressure (small studios operate on very tight margins) is accelerating adoption.

15.3 Latin America: a strategic opportunity

Latin America has a vibrant audiovisual sector, with productions of global impact (Latin American Spanish-language Netflix content has global demand), a consolidated tradition of international co-production, and a creative talent base recognized worldwide. In the field of AI, the region is at an opportunity point that may be seized or missed depending on the decisions made in the next two or three years.

Structural advantages

Latin America has several advantages for audiovisual AI adoption that are not always recognized. The language barrier is actually an advantage: Spanish and Portuguese are the most demanded languages after English on global streaming platforms, which means sustained demand for content and for localization tools in these languages. AI localization models are increasingly optimized for Latin American Spanish and Portuguese.

The relative cost of technical and creative talent in Latin America remains lower than in the U.S. or Western Europe, making the total cost of implementing AI workflows proportionally more attractive. A Mexican, Colombian, or Argentine production company can implement the same AI tools as a Spanish or French production company at a significantly lower total cost.

The absence of regulation as strict as the European AI Act (with exceptions such as Brazil, which is developing AI legislation inspired by the European model) means Latin American producers have more operational freedom in AI use, although this also implies less legal protection for creators.

Specific challenges

Connectivity and cloud computing infrastructure remain real challenges in parts of Latin America. The most advanced AI tools require fast and stable connections, and in markets such as Bolivia, Paraguay, Nicaragua, or large rural areas of Brazil, this remains a practical limitation.

Market fragmentation is another challenge: unlike the U.S. (one dominant audiovisual market) or the EU (a common market in the process of integration), Latin America consists of 20 different markets with different regulatory structures, consumption habits, and distribution possibilities. This complicates scale and dilutes AI investments.

Dependence on international streaming platforms (Netflix, Amazon, Disney+, HBO Max) for global distribution is a double-edged sword: it provides access to global audiences, but also means decisions about what gets produced, how it is distributed, and what data is collected are made by companies headquartered outside the region.

Recommendations for the Latin American sector

For Latin American production companies and broadcasters, the strategic recommendations in AI are threefold. First, prioritize cataloging and monetizing owned archives as the lowest-risk, fastest-return entry point. Second, adopt AI localization tools as a priority investment, given the global distribution potential of Spanish and Portuguese content. Third, build alliances with European production companies, especially Iberian ones, to access knowledge about AI implementation within a mature regulatory framework that will eventually be exported to Latin America.

15.4 Spain and the Iberian market: between Europe and the Hispanic world

Spain occupies a unique position: it is the world’s most advanced Spanish-language market (with the partial exception of the U.S. in specific segments), it is subject to the world’s most demanding European regulation on AI, it has a cultural and linguistic connection to the world’s largest Spanish-speaking market, and it has an expanding audiovisual industry driven by investments from international streaming platforms.

The creation of AESIA (Spanish Artificial Intelligence Supervisory Agency) in 2024 makes Spain one of the most advanced European countries in AI Act implementation. For Spanish production companies and broadcasters, this implies both compliance obligations and opportunities: Spanish companies that implement responsible AI best practices are positioned to export that know-how to the Latin American market, where equivalent regulations are arriving with a 3-5 year delay.

Chapter 16

2027-2030 horizon — What is coming to the audiovisual sector

Any analysis of the future of AI in audiovisual is destined to become obsolete before the book reaches its readers. With that explicit warning, this chapter attempts something more useful than prediction: identifying trends with enough critical mass that they are likely to materialize within a three-to-five-year horizon, and analyzing what implications they would have for the sector.

16.1 Long-form video generation

The most significant limitation of current AI video generation systems (Sora 2, Runway Gen-4, Kling 3.0, Seedance 2.0) is duration: clips lasting seconds to a few minutes with narrative and visual coherence. Research is advancing rapidly toward long-form video generation with consistency of characters, locations, and narrative across an entire scene or even an act. When this threshold is crossed, and everything indicates it will be before 2028, the implications for the sector will be profound.

This does not mean that high-budget fiction productions will disappear: the cost of generating high-quality long-form video with AI will remain significant, and the precise creative control required by a top-level film production will still require intensive human intervention. But the access threshold for audiovisual production will fall radically, and the volume of AI-generated content will increase exponentially.

16.2 Autonomous production agents

The next leap for AI agents in audiovisual is end-to-end autonomy in specific domains. Not an agent that performs a well-defined task under human supervision, but an agent that manages an entire production domain autonomously with sporadic supervision: the localization agent that manages the entire subtitling, dubbing, QC, and delivery pipeline for all languages in a production; the distribution agent that manages the presence of a full catalog across 30 platforms simultaneously; the archive agent that not only catalogs but proactively identifies monetization opportunities and manages licensing requests.

Anthropic’s Claude Managed Agents platform and equivalents from OpenAI and Google are actively developing the capabilities required for these agents: long-term persistent memory, robust multi-step planning, and self-correction capability when errors occur. The horizon for truly autonomous agents in specific audiovisual domains is 2027-2028.

16.3 Real-time multimodal AI

The next frontier after current multimodal models is real-time multimodality: systems that process and generate video, audio, and text simultaneously with no noticeable latency. This has direct applications in live production: production systems that analyze broadcast content in real time and make cutting and composition decisions based on the narrative of what is happening, not only predefined rules; simultaneous subtitling and translation with no latency; and real-time feedback for directors on the technical and narrative quality of what they are filming.

16.4 Photorealistic synthesis of people: the legal and ethical threshold

The technology for synthesizing photorealistic people, including voices, movements, and facial expressions, is advancing toward a tipping point where the distinction between a real person and a synthetic representation will be impossible for the human eye without specific verification tools. This is already almost true under ideal conditions in 2026; by 2028-2029, it will be the norm under general production conditions.

The implications for the sector are simultaneously technical, economic, legal, and philosophical. Technical: production of certain types of content (advertising, corporate content, dubbing, training) can be done without human actors at near-zero marginal cost. Economic: demand for actors for certain types of work falls, while demand for highly visible actors capable of licensing their digital image in a controlled way may increase. Legal: the current regulatory framework (AI Act, GDPR, image rights) will have to be updated to accommodate a reality where any person can be synthetically represented in an indistinguishable way.

The audiovisual sector has a responsibility to participate actively in the development of these rules, not to wait for regulators to define them without its input.

16.5 The AGI question and audiovisual work

Artificial General Intelligence (AGI), roughly defined as an AI system with cognitive capabilities equal to or greater than humans across all relevant domains, is a horizon whose timeline is intensely debated among researchers. Anthropic, OpenAI, and DeepMind have different positions on when that threshold might be reached, with estimates ranging from “we are very close” to “decades away or never.”

For the audiovisual sector, the AGI question is less relevant than what is already happening: current models, which are not AGI, are already transforming the sector. And every iteration of model improvement, even if we are far from AGI, adds capabilities that have a real impact on production. AGI may be a distant or uncertain horizon; next year’s tools are a certainty.

What is important to understand is the direction: AI models are consistently becoming more capable, more efficient, cheaper, and more accessible. There is no sign that this trend will reverse. The audiovisual sector must assume that in five years AI capabilities will be significantly greater than today, even if no one can predict exactly what they will be.

16.6 Energy sustainability of AI

One aspect the audiovisual sector cannot ignore in its AI adoption is energy impact. Large AI models consume massive amounts of energy: training a model the size of GPT-4 consumes the energy equivalent of hundreds of transatlantic flights. Inference (everyday use of the model) also consumes significant energy at scale.

For the audiovisual sector, this has direct implications when AI is integrated into high-volume workflows: AI VFX rendering, large-scale video file processing, operation of recommendation systems for large catalogs. The sustainability credentials of AI providers (what percentage of their energy is renewable, what carbon neutrality commitments they have) are increasingly relevant to purchasing decisions by platforms and producers with ESG commitments.

Paradoxically, AI is also a tool for sustainability in audiovisual work: reducing travel through virtual production, optimizing video compression to reduce streaming energy consumption, and increasing shooting planning efficiency to minimize production days are real contributions by AI to the sector’s environmental footprint.

Chapter 17

Second case study — Global advertising campaign with AI

This second case study explores a type of production different from the documentary in Chapter 11: an audiovisual advertising campaign for a consumer goods brand with distribution in twelve markets simultaneously. The advertising context has specific characteristics that make AI use different from editorial production: more concentrated budgets over shorter periods, the need for multiple versions by market and platform, and faster review and approval cycles.

The brief

A leading food brand in Europe commissions an audiovisual production agency to create a campaign for the launch of a new product. The brief requires: a 30-second television spot, a 15-second YouTube version, a vertical 9:16 version for TikTok and Instagram Stories, localizations for 12 European markets (12 languages, with cultural adaptations for the most important markets), all within six weeks from briefing to delivery of final masters.

Without AI, this brief would require: a production team of 15-20 people, a postproduction and localization budget of €150,000-€250,000, and probably an eight-week schedule. With AI intelligently integrated into the workflow, the same result is achievable with 8-10 people, €80,000-€120,000, and six weeks.

AI workflow — Weeks 1-2: Concept and preproduction

The creative team uses Claude to develop multiple variants of the creative concept from the brand brief, generating five different lines with their respective scripts, tone, and visual references. Midjourney generates the visual moodboard for the three finalist lines for the client presentation. The client approves in one meeting with high-quality visual material prepared in two days instead of the typical week of concept development.

With the concept approved, the team uses Movie Magic Scheduling optimized with AI to plan the two shooting days needed (in studio, with virtual production for two scenes), minimizing configuration changes and maximizing effective camera time. Adobe Firefly generates lighting and composition references for each scene that the cinematographer uses as a starting point for the lighting plan.

Weeks 3-4: Shooting and editing

The shoot takes place over two days in a virtual production studio in Madrid. An LED volume provides the background for the main scene; DJI drones with automatic tracking capture the motion shots. Adobe Premiere Pro automatically transcribes teleprompter dialogue and organizes it by take, allowing the editor to select material by text instead of reviewing rushes hour by hour.

The first cut is completed in three days. DaVinci Resolve automatically applies the base color correction from the LUT approved in preproduction. Premiere’s intelligent reframing automatically generates the 9:16 and 1:1 versions of the spot once the 16:9 version is approved, saving two days of manual adaptation work.

Weeks 5-6: Localization and delivery

This is the phase where AI has the most dramatic impact. With 12 languages and multiple versions per format, manual localization would have required three weeks and a team of voice actors, recording studios, and lip-sync technicians in multiple countries.

With ElevenLabs, the team generates voice-overs in the 12 languages from the original Spanish voices, preserving the tone and emotion of the original narration. Deepdub synchronizes lip movement for the three close-up shots of the presenter using lip-sync technology. Whisper generates subtitles in the 12 languages, which are reviewed by native translators in less than one day per language.

The final masters for 12 languages, 3 formats, and 12 markets (432 files in total) are delivered on time and within budget. The client also receives a Content Credentials report with the full production history of each file, including disclosure of which elements were generated or modified with AI.

Impact analysis

The use of AI in this production represented a 45% reduction in total production time and a 38% reduction in budget compared with an equivalent production without AI. The phases with the greatest impact were localization (70% reduction in time and cost) and reframing for multiple formats (85% reduction in time).

What did not change and could not have changed: the creative vision of the concept, which required two days of intensive human ideation sessions; the direction of the shoot, which required the director’s judgment to capture the shots with the right emotion; and the final editorial review, which required the creative team to listen to each localization and verify that the campaign’s emotional tone was maintained in each language and culture.

Chapter 18

OPEX vs CAPEX — The strategic decision that defines how you adopt AI

Before deciding which AI tool to adopt, there is a more fundamental decision audiovisual companies must make: how to finance and structure that adoption. The choice between investment in owned assets (CAPEX) and recurring operating expense model (OPEX) is not only accounting-related; it defines adoption speed, flexibility in the face of technological change, investment risk profile, and ability to scale. And in the case of AI, this decision has especially relevant implications because the technology evolves at a speed that makes technological assets become obsolete much faster than in other sectors.

18.1 The logic of CAPEX in audiovisual technology infrastructure

The CAPEX (Capital Expenditure) model is the traditional model in the audiovisual sector for adopting technology infrastructure: the company buys equipment, servers, perpetual software licenses, and storage systems, depreciates them over several years for accounting purposes, and manages them internally with its own technical team.

The advantages of the CAPEX model are real and should not be dismissed. Total control over the system means the company can adapt it exactly to its needs without depending on the product decisions of an external provider. Data ownership is clear: audiovisual material, metadata, and AI models trained with proprietary data unambiguously belong to the company. And in companies with sufficient volume and stable technical teams, the long-term total cost can be lower than an equivalent subscription model.

The disadvantages of the CAPEX model in the context of AI are equally concrete. Initial investment is high: building proprietary AI infrastructure for a MAM with advanced capabilities, including the GPU servers required for processing, can require investments from hundreds of thousands to millions of euros. Technological obsolescence is fast: AI hardware (GPUs, TPUs) that is state of the art today may fall significantly behind market-available capabilities in 18-24 months. And the talent cost to operate and maintain proprietary AI systems is increasing and highly competitive.

18.2 The logic of OPEX in AI services

The OPEX (Operational Expenditure) model for AI is structured as subscriptions to SaaS platforms, pay-per-use access to AI model APIs, or managed services where a third party operates AI infrastructure on behalf of the audiovisual company. The expense is recognized in the accounting period in which it occurs; it is not capitalized.

The advantages of the OPEX model for AI adoption are especially relevant in the current technological context. Adoption speed is much greater: instead of a purchasing, installation, and configuration cycle that can last months, an AI SaaS service can be operational in days or weeks. Technology updates are automatic: the provider updates models and infrastructure, and the company benefits from improvements without additional investment. And the risk profile is more favorable: if the tool does not work as expected, the cost of switching to another solution is much lower than amortizing a CAPEX investment.

The disadvantages of the OPEX model also exist. Vendor dependence is real: if the provider raises prices, changes its terms, or disappears, the company may find itself in a vulnerable position. Accumulated recurring costs can exceed the total cost of ownership of an equivalent CAPEX solution in the long term for companies with sufficient volume. And control over data and models may be more limited in a SaaS environment.

18.3 Strategic analysis: when CAPEX, when OPEX

The optimal decision is not universal: it depends on company size, processing volume, technical team maturity, investment time horizon, and tolerance for technological risk. But there are some principles that consistently apply in the audiovisual sector.

The CAPEX model makes more sense when: processing volume is very high and predictable (processing more than 10,000 hours of content per year), the company has an experienced technical team capable of operating and maintaining AI systems, data privacy or security requirements make cloud processing unacceptable, and the investment time horizon is five years or more with sufficient technological stability in the specific domain.

The OPEX model makes more sense when: the company is beginning its AI adoption and needs to validate use cases before committing capital, processing volume is variable or unpredictable, technological update speed is critical (as in AI generation tools, where models change every 6-12 months), the internal technical team lacks capacity to operate complex AI infrastructure, or capital budget is committed to other priorities.

For most mid-sized production companies and broadcasters in the European market, the OPEX model or managed service model is the most rational entry point for AI adoption. It allows them to capture the benefits of the technology without assuming the risks of a CAPEX investment in a technology that evolves so rapidly.

18.4 The hybrid model: the most common solution

In practice, most large organizations in the audiovisual sector adopt a hybrid model combining CAPEX and OPEX elements depending on the type of AI capability. Mature, well-defined, high-volume capabilities (such as archive storage, playout infrastructure, or standard transcoding systems) are usually maintained as CAPEX, where technological stability justifies capital investment. Frontier, evolving, or variable-use capabilities (such as generation models, AI audience analysis, or automatic localization) are contracted as OPEX or managed service.

This hybrid model requires a technology architecture designed for integration: an orchestration layer that connects owned systems (CAPEX) with external services (OPEX), with well-defined APIs, clear security protocols, and fallback mechanisms when external services are unavailable.

18.5 How managed services resolve the dilemma

The managed service model represents a third path that many companies in the audiovisual sector are adopting in response to the OPEX/CAPEX dilemma. In this model, a specialized provider operates AI infrastructure on behalf of the audiovisual company, assuming technical responsibility, technology updates, and system optimization, while the audiovisual company retains control over data, workflows, and editorial decisions.

The advantages of managed service combine the best of both worlds: the company does not need to invest in its own infrastructure (eliminating CAPEX), the provider continuously updates the technology at no additional cost to the company (eliminating obsolescence risk), and the company retains control over its data and processes (overcoming the limitations of generic SaaS). The relationship is deeper than that of a simple SaaS provider: the managed service provider understands the company’s workflows in depth and adapts the technology to them.

The managed service model is especially relevant for components of the audiovisual production chain that require complex integration: not only the AI tool, but its integration with the MAM, playout system, distribution infrastructure, and editorial workflows. A provider operating the entire integrated stack can guarantee a quality of service (SLA) over the whole that no individual component can offer in isolation.

The right question

Before choosing between CAPEX and OPEX, the right strategic question is: what is my organization’s core competency? If it is creating quality audiovisual content, it probably makes no sense to invest capital and talent in operating proprietary AI infrastructure that a specialist can operate better. If it is managing and monetizing large archive volumes with proprietary AI technology as a competitive differentiator, then CAPEX may be justified.

Chapter 19

TSAmediaHUB — A European reference model for integrated AI in broadcasting

Throughout this book, AI tools, platforms, and systems have been cited as abstract references or market examples. This chapter presents a concrete and operational case: TSAmediaHUB, the managed media hub service from Telefónica Servicios Audiovisuales (TSA), as an example of how a European organization has built an integrated AI architecture for the broadcasting sector that simultaneously resolves the OPEX/CAPEX dilemma and the challenge of adopting AI at scale.

It is included here not as an advertising success story, but as a technical and business model reference that concretely illustrates the principles described in the previous chapters: from MAM architecture with AI to the agent model, including intelligent cataloging, multichannel distribution, and the European regulatory framework.

19.1 What TSAmediaHUB is

TSAmediaHUB is Telefónica Servicios Audiovisuales’ integrated service proposal for broadcasting operators, television networks, OTT platforms, and production companies that need to manage, produce, and distribute audiovisual content at scale without taking on the operational complexity of running their own media infrastructure.

The proposal is not that of a tool provider: it is that of an operator of the complete broadcasting value chain, from content ingestion to final distribution across multiple platforms and channels. TSA assumes operational responsibility for media management systems, playout, storage, satellite and fiber distribution, and the artificial intelligence layer that cuts across all these systems. The client focuses on its core competency: content and programming.

19.2 The technology architecture: an integrated AI stack

The TSAmediaHUB architecture is built on a set of top-tier sector technologies, integrated into a coherent stack with AI as a transversal layer, not as an add-on:

Media Asset Management (MAM)

TSAmediaHUB implements a dual MAM strategy combining Tedial Evolution and Dalet Pyramid. This duality is not redundancy: each system covers different usage profiles within the operation, allowing TSA to serve clients with very different needs (from large broadcasters with thousands of archive hours to production companies with more agile workflows) from a unified operations platform.

Etiqmedia, the AI platform for cataloging and content analysis, operates as the intelligence layer for both MAMs: it automatically analyzes every asset entering the system (facial recognition, location identification, topic and genre analysis, automatic transcription and subtitling, technical quality analysis) and continuously enriches metadata. The result is an audiovisual archive that does not merely store content, but understands the content it manages.

Playout and distribution

GV AMPP (Grass Valley Agile Media Processing Platform) is the cloud-native playout system that allows TSAmediaHUB to offer fully managed television channels from the cloud, with the flexibility to scale playout capacity according to client demand. Cloud-native architecture means that launching a new channel or expanding the capacity of an existing one does not require investment in additional hardware: it is a configuration decision with immediate effect.

Content distribution is carried out through the Telefónica Empresas network for fiber and IP distribution, and through agreements with Hispasat and Starlink for satellite distribution, covering both European and Latin American markets. This combination of fiber, IP, and satellite allows TSAmediaHUB to guarantee distribution to practically any market with the connectivity redundancies required for critical broadcasting services.

Operations management (OSS/BSS) and storage

DataMiner acts as the nervous system for monitoring and managing the operation: it supervises all elements of the technology stack in real time, detects anomalies, generates alerts, and facilitates incident management. Its integration with AI systems enables predictive monitoring that identifies potential failures before they impact the service.

Storage combines IBM technology and quantum LTO tape systems for long-term archiving, with intelligent content lifecycle management: frequently accessed material remains on high-speed storage, while historical archive material automatically migrates to lower-cost storage as its access frequency decreases. Demand prediction AI continuously optimizes this distribution.

19.3 The takeover model: E1, E2, and E3

The most relevant differentiator of TSAmediaHUB compared with a standard technology provider is its takeover model. Instead of selling tools that the client operates, TSA takes over (that is, operationally assumes) the client’s broadcasting function at different levels of depth, defined as E1, E2, and E3.

Level E1: Infrastructure takeover

At level E1, TSA assumes operation of the client’s technology infrastructure: MAM, playout, storage, and distribution systems are operated by TSA teams under the agreed service standards. The client retains editorial control, and programming and content teams work with the same tools as always, but technical responsibility for ensuring those systems function correctly falls to TSA. This level is the entry point for clients who want to transform their operating model from CAPEX to OPEX without operational disruption.

Level E2: Operations takeover

At level E2, TSA also assumes broadcasting operational processes: content ingestion and quality control, archive and metadata management, playout planning and execution, and distribution. TSA teams operate end-to-end workflows under the client’s editorial standards. This level is suitable for clients who want to concentrate all their resources on content production and acquisition, outsourcing the complete operation of their chain.

Level E3: Full takeover

At level E3, TSA assumes full responsibility for the client’s audiovisual operation, including technical channel management, regulatory compliance, international distribution, and audience reporting. The client receives the result (a functioning television channel, content distributed across multiple platforms, integrated audience data), and TSA manages all the processes that make it possible. This is the most advanced managed service model in the European broadcasting sector.

19.4 The Adaptive Value System (AVS) and CUE

The most distinctive intelligence layer in TSAmediaHUB is the Adaptive Value System (AVS), a proprietary system under development that integrates multiple data sources and artificial intelligence to continuously optimize the value of the client’s archive and catalog.

The precursor concept to AVS is CUE (Ultra-Efficient Cataloging), which establishes the principles of intelligent AI cataloging: not merely tagging content with descriptive metadata, but understanding the potential value of each asset in different usage contexts (archive, licensing, repurposing, new distribution) and prioritizing cataloging work according to that potential value. CUE turns cataloging from an operational task into a strategic asset management function.

AVS takes these principles further: it integrates audience data, licensing market data, content performance history, and external demand signals to continuously generate a dynamic valuation of each catalog asset. This allows the client to make informed decisions about which content to prioritize for restoration, which material has the greatest licensing potential, and which archive assets are capable of generating new revenue streams that are not currently being captured.

19.5 TSAmediaHUB as a response to the OPEX/CAPEX dilemma

TSAmediaHUB concretely illustrates how the managed service model resolves the OPEX/CAPEX dilemma for European broadcasting clients. A broadcaster that contracts TSAmediaHUB at level E2 or E3 converts what would be a multimillion-euro CAPEX investment in proprietary media infrastructure into predictable and scalable monthly OPEX.

The advantages of this conversion are multiple. The broadcaster immediately accesses a top-tier technology stack (Tedial, Dalet, GV AMPP, Etiqmedia, DataMiner) that would be very costly to implement and operate independently. Technology updates are TSA’s responsibility: when a new version of Etiqmedia appears with better AI analysis capabilities, the client benefits without additional implementation cost. And the SLA covers the whole service, not individual components: if there is an incident in the distribution chain, TSA resolves the root cause regardless of which technical system generated it.

For a broadcaster’s chief financial officer, the proposal is concrete: eliminate CAPEX in media technology infrastructure, convert it into predictable OPEX, and free capital for investment in content, where the real competitive differentiator resides. For the technical director, the proposal is equally concrete: access cutting-edge AI capabilities operated by a specialized team without having to build or maintain that team internally.

19.6 Relevance for the European and Ibero-American markets

The TSAmediaHUB model has special relevance in the European market for several reasons that go beyond its technical capabilities. It operates under the European regulatory framework (AI Act, GDPR, audiovisual communication regulation) by design, ensuring that clients contracting the service are automatically aligned with the compliance requirements the AI Act demands for AI systems in broadcasting.

Telefónica’s connectivity, which includes both the Telefónica Empresas fiber network and agreements with Hispasat for satellite coverage, makes TSAmediaHUB especially relevant for the Ibero-American market: a Latin American broadcaster can contract a fully managed broadcasting operation with guaranteed distribution in both Europe and Latin America, operated under European standards of quality and regulation.

For professionals in the sector evaluating managed broadcasting service options with integrated AI, TSAmediaHUB represents the European state of the art in this business model, and its integrated-stack architecture with transversal AI is the reference for how to design a managed broadcasting service for the age of artificial intelligence.

More information: [email protected]

Resources

RESOURCES AND RECOMMENDED READING

This section gathers selected resources for going deeper into the topics covered in the book. The sector evolves quickly; always prioritize the most recent publication sources.

Regulation and legal framework

AI Act (EU AI Regulation): the full text and implementation guides are available at eur-lex.europa.eu. The European AI Office, created in 2024, publishes sector-by-sector application guidelines at digital-strategy.ec.europa.eu. The Future of Life Institute’s AI Act Explorer offers an interactive interface to explore the regulation by use case.

For actors’ and talent rights: the 2023 and 2024 SAG-AFTRA AI agreements are reference documents available at sag-aftra.org. The EBU (European Broadcasting Union) report on AI and rights in broadcasting, published in 2025, is especially relevant to the European market and downloadable at ebu.ch.

Tools and workflows

IBC (International Broadcasting Convention) publishes an annual report on the state of technology in the audiovisual sector, with extensive coverage of AI: ibc.org/technology. NAB Show produces technical documentation on AI production workflows available at nabshow.com. Variety Intelligence Platform and The Hollywood Reporter publish regular analyses of AI adoption in the entertainment sector.

For specific workflows, the official YouTube channels of Adobe Premiere Pro, DaVinci Resolve, and ElevenLabs are updated primary sources. Anthropic’s official documentation on Claude and agents (docs.anthropic.com) is the reference for understanding the capabilities and limitations of the most advanced models.

Research and trends

McKinsey Global Institute publishes annual analyses on AI’s economic impact by sector. Goldman Sachs’ report “Generative AI: Too Much Spend, Too Little Benefit?” (2024) offers a critical and well-documented perspective on enterprise AI adoption. For frontier technical research, papers from Anthropic, Google DeepMind, and OpenAI are available at arxiv.org and on each company’s technical blogs.

In Spanish, Barlovento Comunicación’s annual report on the Spanish audiovisual market increasingly includes analysis of AI adoption. EGEDA and ICAA publish sector studies relevant to the Ibero-American market.

Communities and training

The Runway ML community on Discord is one of the most active for sharing AI video generation techniques. The DaVinci Resolve forum at BMD User (forum.blackmagicdesign.com) includes specific threads on AI features. The LinkedIn group “AI in Media & Entertainment” (more than 40,000 members in 2026) is a good thermometer for sector conversations.

For structured training: DeepLearning.AI offers specialized courses in AI for creators, including “AI for Everyone” (free) and specific courses on generative models. The MasterClass platform has courses from directors and screenwriters on using AI in the creative process. Domestika and LinkedIn Learning have a growing offering of courses on AI tools for audiovisual production in Spanish.

Chapter 20

Four conversations the industry needs to have

The previous nineteen chapters describe what AI can do in the audiovisual sector, how to do it, and under what regulatory and economic framework. This chapter addresses four different conversations that this journey leaves pending: who this kind of guide is really for, the fragility of what seems solid today, the lines that should not be crossed, and when the smartest decision is precisely not to use AI.

20.1 Who this guide is for (and who it is not for)

This guide is not written to explain what artificial intelligence is. It is written for professionals who already work in the audiovisual sector and need to understand how AI changes their work, what decisions it requires from them, and what risks they must manage. That distinction is not cosmetic: it defines what is included, what is omitted, and how each topic is treated.

The reader who will find the most value here is the producer who needs to understand which workflows can be automated and which cannot, the archive manager evaluating whether it makes sense to implement a MAM with AI, the technical director of a broadcaster analyzing whether to outsource operations under a managed service model, the director who wants to integrate generation tools into the creative process without compromising editorial quality, and the executive who has to justify an AI technology investment before the board of directors.

This guide is not aimed at those seeking to learn how to program AI models, those wanting a general introduction to artificial intelligence as a cultural and social phenomenon, or those needing a deep technical evaluation of system architectures. More appropriate resources exist for those readers. This guide is vertical, sector-specific, and operational. Its unit of measure is usefulness for those making decisions in production, distribution, or audiovisual content management.

Usage criterion

If after reading a chapter a professional in the sector can make a better-informed decision, ask a more precise question to their technical team, or identify a risk they had not considered, the chapter fulfills its function. If it only adds general information that could be found in any technology explainer article, it does not.

20.2 The risk of obsolescence: why criteria matter more than tools

Any guide about AI tools has a structural problem: it ages. The models that are reference points today may be obsolete in twelve months. Platforms that are standard today may disappear, change pricing model, or be acquired by competitors with different agendas. A tool that a team has deeply integrated into its workflow may cease to be available with thirty days’ notice.

This risk of technological obsolescence is not theoretical: it is one of the most real operational risks for audiovisual-sector organizations in 2026. And it has three dimensions that deserve explicit treatment.

The first is model obsolescence. Generative AI models improve so quickly that an integration built on a specific model may become functionally outdated before the organization has amortized the implementation investment. The correct practice is to build integrations on abstract API versions, not specific models, and to plan review cycles every six to twelve months.

The second is provider obsolescence. Consolidation in the AI market is producing acquisitions, closures, and strategic pivots at unusual speed. A platform that had ten thousand customers in 2024 may have been absorbed by a larger competitor in 2025, with changes in data policy, pricing, or service continuity. Provider evaluation must explicitly include financial strength, investors, and contractual commitments on service continuity.

The third is obsolescence of internal knowledge. Teams that learn to use a specific AI tool accumulate knowledge that may lose value if the tool changes radically or disappears. AI training for the audiovisual sector should prioritize operational principles and evaluation criteria over technical mastery of specific tools.

The practical consequence of all this is that the long-term value of this guide lies not in the names of the tools it mentions, but in the decision frameworks it proposes. Tools change. The question of when it makes sense to automate a task, what level of human supervision an agentic process requires, or how to evaluate ROI on an AI archive investment does not change at the same speed.

20.3 Kill list: practices that must be eliminated or strictly supervised

A serious professional guide on AI cannot limit itself to describing what the technology can do. It must also clearly point out what should not be done, what can only be done under very specific conditions, and what requires strict supervision before implementation. The following are lines the professional audiovisual sector should not cross without having resolved the corresponding legal, ethical, and operational issues.

Do not do: biometric AI in casting and employment decisions without safeguards

The use of facial recognition or biometric analysis systems to filter candidates in casting processes or make hiring decisions in the EU is a high-risk system under the AI Act. It cannot be the sole basis for a decision. It cannot be used without prior conformity assessment. It cannot be applied to biometric data without explicit consent under the GDPR. In practice: if an AI system gives you a casting candidate list without an independent human professional applying judgment to that list, you are in a real legal risk zone. Eliminate that workflow or redesign it with explicit human supervision in every selection decision.

Do not do: voice cloning without a specific rights contract

Voice cloning of any identifiable person without a contract explicitly covering that use is a GDPR violation (voice data is special category biometric data) and may constitute an infringement of the person’s image and personality rights. This applies to actors, presenters, voice-over artists, journalists, and anyone whose voice is recognizable. Generic consent to record is not enough: specific consent is required for cloning and for each concrete use of the clone. If your workflow includes voice cloning without that contractual framework, stop.

Do not do: autonomous agents for irreversible actions without supervision

No AI agent should be authorized to execute irreversible actions autonomously without a human confirmation point. This includes: delivering masters or final files to clients or platforms, modifying or deleting archive metadata in bulk, publishing or distributing content on behalf of the organization, deleting production material, and sending external communications on behalf of the organization. The cost of these errors, if they occur, can be very high: from loss of irreplaceable material to contractual breaches with distribution platforms. The correct design is always: the agent prepares, the human confirms before the irreversible action is executed.

Monitor closely: audience analysis tools using personal data

Audience analysis systems that process individual user behavior data (viewing times, interaction patterns, inferred demographic data) are subject to the GDPR and to DSA transparency obligations for large platforms. If your organization uses or integrates these data into its own AI systems, you need an explicit legal basis, clear retention policies, and functioning opt-out mechanisms. Noncompliance has real sanctioning consequences.

Monitor closely: content generation with opaque training models

Generative AI models trained on protected content without licenses are at the center of a wave of active litigation. For commercial productions, using these models creates a real litigation risk that your organization’s legal department must evaluate. Operational recommendation: for commercial production, prioritize tools with transparent training policies and rights indemnification. For internal exploration and non-commercial prototyping, the risk is lower but not nonexistent.

20.4 When NOT to use AI: the most undervalued professional criterion

The pressure toward AI adoption in the audiovisual sector is real and growing. It comes from technology vendors, distribution platforms, boards of directors, and competitive market pressure. In that context, the ability to identify when AI does not add value or introduces more problems than solutions is a professional skill as valuable as knowing how to implement it, if not more so.

Do not use AI when volume does not justify complexity

Implementing an AI cataloging system makes sense with thousands of archive hours. It does not make sense for a production company with two hundred hours of manually well-organized material. The friction of implementing, maintaining, and updating the system far outweighs the benefit. The hidden cost of AI adoption, especially in integrations with existing systems, is often underestimated. Before implementing any AI solution, quantify the volume of the problem you are solving and compare it with the real cost of the solution, including integration, training, and maintenance.

Do not use AI when output traceability is critical and cannot be guaranteed

In contexts where full traceability of the origin of every element in a deliverable is a legal or contractual requirement, generative AI systems introduce a problem: the exact origin of each generated fragment is not always attributable with the precision a rights contract may require. For productions with very strict rights clearance requirements (co-productions with American majors, content for platforms with rights indemnification policies), the use of generative AI in content elements requires prior legal analysis that in many cases may not conclude favorably.

Do not use AI when the team cannot verify output quality

If the team that will use the output of an AI system lacks the knowledge required to evaluate whether that output is correct or not, you are introducing risk without a control mechanism. An automatic subtitling system in a language no one on the team speaks, a metadata analysis system whose output no one reviews, or an agent managing communications in a market the team does not know well: in all these cases, AI may be making systematic errors that no one detects until they have real consequences.

Do not use AI as a substitute for editorial judgment in sensitive content

AI can moderate content, but it cannot exercise the editorial judgment required by sensitive content: news about active conflicts, content involving minors, information that may affect the safety of identifiable people, or content with complex legal dimensions. In these contexts, AI can be a useful assistant to the professional, but the final decision must be human, documented, and assumed by someone with real editorial responsibility.

Do not use AI when the process is faster without it

This point seems obvious but is systematically ignored in practice: for low-frequency, highly specific tasks with complex context, the time required to formulate the prompt correctly, review the output, and correct errors may far exceed the time needed to do it manually. An experienced professional writing a one-minute archive description may take two minutes. Configuring, running, and reviewing an automated system for that same file may take longer. Automation makes sense at scale, not necessarily for every individual task.

This section is not part of the book’s technical core. It is deliberately placed after the epilogue as a Side B: an extra for those who, after going through tools, architectures, and real cases, ask themselves something more uncomfortable and more important: how to work with artificial intelligence without losing judgment.

The Stainless Iteration Method is not a technique for getting better answers from an AI. It is a work discipline meant to avoid one of the most subtle risks of this technology: accepting as valid something that sounds good without subjecting it to real review.

It carries this name for a personal reason. Asier Inoxidable is my creative alter ego, under which I write fiction, poetry, and music. What is stainless is not what never gets stained, but what does not rust over time. Translated into work with AI, this method seeks exactly that: to keep human judgment from rusting because everything has been delegated to systems that know how to write better than they know how to think.

Working with AI today does not mean asking for an answer and publishing it. It means designing a process that forces AI to reveal its limits before those limits become real errors.

The underlying problem: plausible answers, not necessarily solid ones

AI models are optimized to generate coherent and convincing text. They are not designed to actively warn of their gaps, nor to distinguish between certainties and assumptions unless explicitly forced to do so. The result is familiar to anyone who uses them intensively: documents that seem complete but hide gaps, analyses that mix facts with interpretations, or code that works in the example but fails in production.

The Stainless Iteration Method starts from a simple premise: the first answer is almost never the best possible result. And assuming otherwise is a modern form of self-deception.

Principle 1 · Separate generation from evaluation

The most common mistake when working with AI is asking for improvements before evaluating the real quality of the output. Iterating does not mean asking for more text, but introducing an explicit evaluation phase.

Before correcting, rewriting, or expanding, the first step is to force the AI itself to audit its work: identify weaknesses, assumptions, and low-confidence areas. When done well, that self-evaluation offers a risk map much more useful than an automatic rewrite.

Principle 2 · Force explicit self-evaluation

An AI usually improves superficially if it is asked to “improve” something. But its behavior changes radically if it is asked to grade itself and explain why it does not reach excellence.

Questions such as “which parts are weakest?” or “which claims are you least confident about?” activate a different mode of analysis. They do not eliminate errors, but they make them visible. And in a professional environment, visibility is control.

Principle 3 · Cross-review between models

Each AI model has its own blind spots. Some prioritize coherence over precision. Others are excessively conservative. Others lose depth on specific topics.

The method introduces a second review: take the deliverable and the self-evaluation to another model and ask it to act as an independent reviewer. Not to rewrite, but to critique. Cross-review does not guarantee truth, but it significantly reduces the likelihood of undetected error.

Principle 4 · Final judgment is not delegated

The most important step in the method is not performed by AI. It is performed by the person.

After comparing self-evaluation and cross-review, someone must decide which criticisms are relevant, which are noise, and which changes to apply. Delegating that synthesis to AI as well cancels the method. The goal is not to replace human judgment, but to force it to operate with better signals.

Where this method has the most value

The Stainless Iteration Method is not necessary for everything. It makes sense when the cost of error is high: public documents, commercial proposals, strategic reports, production code, materials bearing your name or your organization’s name.

In internal drafts or quick explorations, it can be simplified. But the more critical the deliverable, the more value comes from adding two steps that slightly slow down the process and multiply quality.

What this method does not do

It does not make AI infallible. It does not replace expert knowledge. It does not eliminate cultural or training biases.

What it does do is prevent fluency and speed from rusting critical thinking.

Closing

Artificial intelligence will keep improving. Answers will become increasingly natural, faster, and more convincing. Precisely for that reason, the risk is not trusting AI too little, but trusting it too much.

The Stainless Iteration Method is a quiet form of resistance: a reminder that thinking remains a human responsibility, even — and especially — when we work with machines that write so well.

Asier Inoxidable — The other side

The stainless also begins somewhere else. Asier Inoxidable is the name under which I write fiction, poetry, and the lyrics of the music I produce. It is not a brand or a marketing project: it is simply the way what cannot remain inside finds a way out.

In 2025 I released my first musical work under that name: ÍNTIMO E INOXIDABLE. The songs are written by me — some composed after years of accumulated silence, others in one uninterrupted burst. The production is also mine. The result is something that sounds like me, not like any trend.

I have produced music with AI as a tool — in some cases to explore textures I could not build alone, in others to verify that what I wanted was exactly what I already had. The Stainless Iteration Method was born, in part, from those sessions: from learning to distinguish when AI amplified something real and when it merely produced something that sounded good but was not mine.

If you have made it this far — through MAMs, AI Acts, takeover models, and attention layers in transformers — perhaps you might also feel like listening to something completely different. Or not. But the door is open.

asierinoxidable.com

· Spotify

· Apple Music

· Amazon Music

· YouTube

Epilogue

An industry in permanent beta

At the moment this book was written, AI agents were the most active frontier in the field. By the time you are reading it, capabilities that were experimental in 2026 are likely already production standards, and problems we thought were solved may have turned out to be more complex than expected. This speed of change is not an uncomfortable feature of the moment we are living through: it is the structural nature of the era in which the audiovisual sector now operates.

The correct response is not to ignore the tools until they stabilize (they will not), nor to chase every novelty without strategy (which produces paralysis and costs without return), nor to surrender to catastrophism about the end of human creative work (which reflects neither the available evidence nor the real complexity of the creative process).

The correct response is more difficult and more interesting: build a professional position of continuous learning with judgment. Experiment with real tools on real problems. Adopt what generates demonstrable value. Actively reject what creates complexity without benefit. Demand transparency from systems and platforms. Actively protect the workers and creators most vulnerable to the transition. And always keep the focus on what does not change: stories that matter to human beings, authenticity that connects emotionally, quality that endures beyond technological trends.

The most powerful artificial intelligence in the audiovisual sector will, in the end, continue to be yours.