Dynamic Dialogue in Unity: Scriptable Objects and AI Voice

Posted by Gemma Ellison

July 12, 2025

Let’s face it, creating engaging games is HARD. Dialogue, often overlooked, is crucial for drawing players into your world and story. We’re ditching the boring, linear text boxes and leveling up our game development skills. In this guide, we’ll build a dynamic dialogue system in Unity, using Scriptable Objects and a free AI voice generator to breathe life into our characters, making your games instantly more immersive. Forget tedious, repetitive tasks - we’re building a system that’s efficient, scalable, and, most importantly, fun to use.

Scriptable Objects: The Backbone of Your Dialogue

Why Scriptable Objects? They’re your secret weapon for organized dialogue. Forget messy code and hardcoded text. We’re using Scriptable Objects to store dialogue lines, character information, and even conditional logic.

First, let’s create a DialogueData Scriptable Object:

using UnityEngine;
using System.Collections.Generic;

[CreateAssetMenu(fileName = "NewDialogue", menuName = "Dialogue/DialogueData")]
public class DialogueData : ScriptableObject
{
    [System.Serializable]
    public struct DialogueLine
    {
        public string characterName;
        [TextArea(3, 10)]
        public string dialogueText;
        public AudioClip voiceOver; // Optional, for pre-recorded voiceovers
    }

    public List<DialogueLine> dialogueLines;
}

Create a new Scriptable Object by right-clicking in your Project window, selecting “Create” -> “Dialogue” -> "DialogueData". Name it something descriptive like “PrologueDialogue.” This allows you to easily create new dialogue sequences without touching code.

This approach cleanly separates your dialogue content from your game logic. Need to tweak a line? Simply edit the Scriptable Object.

Common Pitfall: Forgetting to use TextArea attribute for the dialogueText field. This makes editing long lines of dialogue incredibly frustrating in the inspector.

Setting up the UI

A clear and intuitive UI is essential for presenting dialogue effectively. Create a Canvas in your scene, and then add the following UI elements:

Text (TextMeshPro): For the character’s name.
Text (TextMeshPro): For the dialogue text itself.
Image: (Optional) For a character portrait.
Button: For advancing to the next dialogue line.

Anchor these elements appropriately so they scale well on different screen sizes.

Now, create a DialogueUI script to manage the UI:

using UnityEngine;
using UnityEngine.UI;
using TMPro;
using System.Collections;

public class DialogueUI : MonoBehaviour
{
    public TextMeshProUGUI characterNameText;
    public TextMeshProUGUI dialogueText;
    public Image characterPortrait;
    public Button nextButton;
    private DialogueManager dialogueManager;

    public void Initialize(DialogueManager dm)
    {
        dialogueManager = dm;
        nextButton.onClick.AddListener(dialogueManager.DisplayNextDialogue);
    }

    public void SetDialogue(string characterName, string dialogue)
    {
        characterNameText.text = characterName;
        dialogueText.text = dialogue;
    }

   public void ShowDialogueBox(bool shouldShow) {
        gameObject.SetActive(shouldShow);
   }
}

Attach this script to your Dialogue Canvas and drag the corresponding UI elements into the inspector. Make sure to disable the Dialogue Canvas in the Inspector; it will be enabled by the DialogueManager when dialogue should be displayed.

Challenge: Handling text overflow in the dialogue box. Use TextMeshPro’s auto-size feature or implement scrolling to prevent text from spilling out of the UI.

Dialogue Manager: Orchestrating the Conversation

The DialogueManager is the brains of our operation. It handles loading dialogue data, updating the UI, and managing the flow of the conversation.

using UnityEngine;
using System.Collections;
using System.Collections.Generic;

public class DialogueManager : MonoBehaviour
{
    public DialogueData currentDialogue;
    public DialogueUI dialogueUI;

    private int currentLineIndex = 0;
    private List<DialogueData.DialogueLine> dialogueLines;
    private bool isDialogueActive = false;

    void Start() {
        dialogueUI.Initialize(this);
    }

    public void StartDialogue(DialogueData dialogueData)
    {
        currentDialogue = dialogueData;
        dialogueLines = new List<DialogueData.DialogueLine>(currentDialogue.dialogueLines);
        currentLineIndex = 0;
        isDialogueActive = true;
        dialogueUI.ShowDialogueBox(true);
        DisplayNextDialogue();
    }

    public void DisplayNextDialogue()
    {
        if (!isDialogueActive) return;

        if (currentLineIndex < dialogueLines.Count)
        {
            DialogueData.DialogueLine line = dialogueLines[currentLineIndex];
            dialogueUI.SetDialogue(line.characterName, line.dialogueText);

            // Play voiceover if available (implementation details omitted)
            // PlayVoiceOver(line.voiceOver);

            currentLineIndex++;
        }
        else
        {
            EndDialogue();
        }
    }

    public void EndDialogue()
    {
        isDialogueActive = false;
        dialogueUI.ShowDialogueBox(false);
    }
}

Attach this script to an empty GameObject in your scene (e.g., “DialogueManager”). Drag your DialogueUI Canvas into the dialogueUI field. Create a method trigger the dialouge, for example dialogueManager.StartDialogue(myDialogueData);

Adding AI Voice with a Free API (Coqui.ai)

Text-to-speech (TTS) is the final touch that brings your dialogue to life. Coqui.ai offers a free and open-source TTS API called XTTS v2, allowing you to generate realistic voices directly in your game.

Important: While Coqui.ai is free and open-source, remember that using it directly in a live, commercial game might require adhering to specific licensing terms. Always check the official documentation.

//This is an example - It is better to create audio clips ahead of time and store in your Scriptable Object
using UnityEngine;
using UnityEngine.Networking;
using System.Collections;

public class CoquiTTS : MonoBehaviour
{
    public string apiUrl = "YOUR_COQUI_API_URL"; // Replace with your Coqui.ai API endpoint
    public string speakerWavPath = "path/to/your/speaker.wav"; // Replace with the path to your speaker WAV file. For better results, use a clip of the speaker saying a sentence

    public IEnumerator GenerateSpeech(string text, System.Action<AudioClip> callback)
    {
        WWWForm form = new WWWForm();
        form.AddField("text", text);
        form.AddField("speaker_wav", speakerWavPath); //Path to a wav file used to clone the speaker's voice
        form.AddField("language", "en"); // or another supported language

        using (UnityWebRequest www = UnityWebRequest.Post(apiUrl + "/tts", form))
        {
            yield return www.SendWebRequest();

            if (www.result != UnityWebRequest.Result.Success)
            {
                Debug.LogError("Coqui TTS Error: " + www.error);
            }
            else
            {
                byte[] audioData = www.downloadHandler.data;
                AudioClip clip = DecodeAudio(audioData, text); // Assuming you have a DecodeAudio function

                callback?.Invoke(clip);
            }
        }
    }

    // Example DecodeAudio function
    private AudioClip DecodeAudio(byte[] data, string clipName) {
        float[] floatArray = ConvertByteArrayToFloatArray(data); // Implement this based on the audio format.
        AudioClip clip = AudioClip.Create(clipName, floatArray.Length, 1, 22050, false);
        clip.SetData(floatArray, 0);

        return clip;
    }

    //Needs to be implemented based on the file you download - example for PCM
    private float[] ConvertByteArrayToFloatArray(byte[] array)
    {
        float[] floatArr = new float[array.Length / 4];
        for (int i = 0; i < floatArr.Length; i++)
        {
            if (BitConverter.IsLittleEndian)
                Array.Reverse(array, i*4, 4);
            int intValue = BitConverter.ToInt32(array, i*4);
            floatArr[i] = intValue / 2147483648f;
        }
        return floatArr;
    }
}

Important Considerations:

API Endpoint: You will need to set up an instance of the Coqui.ai TTS server or find a hosted solution, and replace "YOUR_COQUI_API_URL" with the correct endpoint. If you do not set up Coqui, this will not work.
Asynchronous Operations: TTS generation takes time. Always use coroutines (like in the example) to avoid blocking the main thread.
Error Handling: Robust error handling is crucial. Check the UnityWebRequest.result and display informative messages to the user.

You would call StartCoroutine(GetComponent<CoquiTTS>().GenerateSpeech(line.dialogueText, clip => { /* Play the clip */ }));

Challenges:

Latency: Generating voiceovers on demand can introduce latency. Pre-generate voiceovers for common phrases or critical dialogue to mitigate this.
Voice Quality: Experiment with different speakers and parameters to achieve the desired voice quality. The quality of the speaker .wav file has a huge impact.

Polishing Your Dialogue System

Here are some tips to take your dialogue system to the next level:

Choices and Branching: Implement branching dialogue based on player choices. Store possible responses in the DialogueData Scriptable Object and update the DialogueManager to handle them.
Animations: Trigger character animations (e.g., lip-syncing, gestures) when dialogue is spoken.
Localization: Support multiple languages by storing dialogue text in separate files and loading them dynamically based on the player’s language setting.

By implementing these techniques, you can create a dialogue system that is engaging, immersive, and enhances the overall player experience. Go beyond simple text on a screen. Craft stories that resonate with your players.