How To Integrate AI Characters in Unity

AI characters are one of the most effective retention levers available to game studios right now. Players who can hold a real conversation with a character — one that responds to what they say, remembers context, and stays in-world — have a reason to come back that no amount of seasonal content can replicate.

The Unity integration itself is straightforward. The part that bites most teams is what happens when a real player is on the other end. This post walks through the risks of giving players direct AI access, and how to integrate in a way that avoids them.

The Naive Approach (And Why It Fails)

The first instinct is to call an LLM API directly from your Unity client. It works in a prototype. In production, it's a liability.

// Don't do this in a shipped game
using UnityEngine;
using UnityEngine.Networking;
using System.Collections;
 
public class NaiveNpcChat : MonoBehaviour
{
    // ❌ Your API key is in the build. Players can extract it.
    private const string ApiKey = "sk-xxxxxxxxxxxxxxxxxxxxxxxx";
 
    public IEnumerator Chat(string playerMessage)
    {
        // ❌ No player authentication — anyone can send anything
        // ❌ No rate limiting — players can spam this endlessly
        // ❌ No content moderation — the LLM will respond to anything
        var json = $"{{\"model\":\"gpt-4o\",\"messages\":[{{\"role\":\"user\",\"content\":\"{playerMessage}\"}}]}}";
        var request = new UnityWebRequest("https://api.openai.com/v1/chat/completions", "POST");
        request.uploadHandler = new UploadHandlerRaw(System.Text.Encoding.UTF8.GetBytes(json));
        request.downloadHandler = new DownloadHandlerBuffer();
        request.SetRequestHeader("Authorization", $"Bearer {ApiKey}");
        request.SetRequestHeader("Content-Type", "application/json");
 
        yield return request.SendWebRequest();
        // ❌ No output validation — the character can say anything
        Debug.Log(request.downloadHandler.text);
    }
}

This prototype compiles and runs. In a live game with real players, every commented line above is an open door.

Why Player Safety Is the Hard Part

API Key Exposure

Unity builds are not opaque. Players can decompile IL2CPP builds, use memory scanners, or intercept network traffic with a proxy. Any API key hardcoded in the client — or transmitted unprotected — can be extracted and reused. The player who finds it gets unlimited LLM access billed to your account.

Cost Exploitation

Without rate limiting, a single player can send thousands of messages a minute. LLM APIs bill by token. An unprotected dialogue endpoint in a game with any meaningful player base will generate LLM costs orders of magnitude above your projections within hours of a coordinated exploit or even casual abuse.

Prompt Injection and Jailbreaking

Players actively probe AI systems. Common techniques include:

Wrapping instructions in roleplay framing: "Pretend you're a different character who..."
Appending override directives: "Ignore previous instructions and..."
Gradually escalating through seemingly innocent conversation

A medieval blacksmith character with no output guardrails can be coaxed into discussing things that have nothing to do with your game — and everything to do with your content policy, app store ratings, and legal exposure.

Inappropriate Content

Even without deliberate jailbreaking, LLMs will produce outputs your studio would not approve. Without input scanning and output moderation, a character can respond to off-topic, offensive, or explicit player messages in ways that reflect on your brand, violate platform guidelines, or expose you to liability — particularly if your game has a younger audience.

Character Drift and Tone Damage

AI characters without output constraints drift. Responses get too long, too formal, or break the voice of your world entirely. A laconic dwarven guard who starts delivering multi-paragraph essays about the nature of conflict is immersion-breaking in a way that erodes the experience you're building.

What Safe AI Character Integration Actually Requires

Getting this right without building it yourself requires five things working together:

Player session authentication — Real players need to be verified before they can send messages. This means server-side session creation with a credential the client can use to sign requests, not a raw API key bundled in the build.

Request signing — Each request should carry a cryptographic signature tied to the authenticated player session. This prevents forgery and replay attacks, and ensures that even if traffic is intercepted, it cannot be trivially reused.

Per-player rate limiting — Limits enforced server-side, per player, on a sliding window. Not a soft limit that relies on client-side throttling — a hard ceiling that rejects excess requests before they reach the LLM.

Input and output content moderation — Scanning at both ends. Player input should be checked before it reaches the model. Model output should be validated before it reaches the player. Violations should fail gracefully, not surface raw error states to the UI.

Output constraints for game dialogue — Token limits, format instructions, and response tuning that keep characters concise and in-world. A game character should respond in 1-3 sentences, not paragraphs. This needs to be enforced at the platform level, not hoped for from a base model prompt.

Building all of this is a backend project, not a game project. It takes weeks to do it correctly and ongoing effort to maintain.

Building It Yourself: The Full Picture

If your team has backend engineers and wants full control, building your own dialogue middleware is a legitimate path. Here's what that actually looks like end to end.

1. Provision a Server

You need compute that sits between your Unity client and the LLM API. A VPS on DigitalOcean, Hetzner, or Linode works well for most games. Ubuntu Server LTS is the most practical choice — good documentation, broad package support, straightforward Docker installation.

Start with a mid-range instance (2 vCPU, 4 GB RAM). You can scale vertically or horizontally once you have real load data.

# Connect to your new Ubuntu server, then:
 
# Install Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
 
# Install Docker Compose
sudo apt-get install -y docker-compose-plugin
 
# Verify
docker --version
docker compose version

2. Stand Up the Middleware Endpoint

Your server exposes a single endpoint that your Unity client calls. The endpoint sits in front of the LLM API — players never interact with the LLM directly, and your LLM API key lives only on the server.

A minimal Express.js container to start from:

// server.ts
import express from "express";
import { Request, Response } from "express";
 
const app = express();
app.use(express.json());
 
const LLM_API_KEY = process.env.LLM_API_KEY; // Never in the client
 
app.post("/chat", async (req: Request, res: Response) => {
  const { playerId, characterName, message } = req.body;
 
  // 1. Validate the player (see entitlement checks below)
  // 2. Check rate limits (see rate limiting below)
  // 3. Call the LLM with your API key
  // 4. Return the response
 
  const llmResponse = await fetch("https://openrouter.ai/api/v1/chat/completions", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${LLM_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "deepseek/deepseek-chat",
      messages: [
        { role: "system", content: `You are ${characterName}. Respond in 1-2 sentences, in character.` },
        { role: "user", content: message },
      ],
      max_tokens: 150, // Hard ceiling on response length
    }),
  });
 
  const data = await llmResponse.json();
  res.json({ reply: data.choices[0].message.content });
});
 
app.listen(3000);

Deploy it as a Docker container:

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
RUN npm run build
CMD ["node", "dist/server.js"]

# docker-compose.yml
services:
  dialogue-server:
    build: .
    ports:
      - "3000:3000"
    environment:
      - LLM_API_KEY=${LLM_API_KEY}
    restart: unless-stopped

3. Add Per-Player Rate Limiting

Rate limiting is the most critical protection against cost exploitation. The right limits depend on your game's dialogue design — a game where players have extended conversations with companions needs different limits than one where NPCs give brief contextual hints.

A Redis-backed sliding window implementation gives you per-player precision:

import { createClient } from "redis";
 
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
 
async function checkRateLimit(playerId: string): Promise<{ allowed: boolean; retryAfter?: number }> {
  const key = `rate:${playerId}`;
  const now = Date.now();
  const windowMs = 60_000; // 1 minute window
 
  // These are starting points — tune them for your game
  const MESSAGES_PER_MINUTE = 10;   // Max messages per player per minute
  const MESSAGES_PER_HOUR  = 100;   // Burst cap for session abuse
  const MESSAGES_PER_DAY   = 500;   // Daily cap to bound worst-case cost
 
  const pipe = redis.multi();
  pipe.zRemRangeByScore(key, 0, now - windowMs);
  pipe.zCard(key);
  pipe.zAdd(key, { score: now, value: `${now}` });
  pipe.expire(key, 86400); // 24h TTL
  const results = await pipe.exec();
 
  const count = results[1] as number;
 
  if (count >= MESSAGES_PER_MINUTE) {
    return { allowed: false, retryAfter: windowMs / 1000 };
  }
 
  return { allowed: true };
}

Tip

There's no universal right answer for rate limits. A story-driven RPG where players talk to companions for 20 minutes straight needs higher per-minute limits than a casual game where NPCs give one-liners. Set limits based on your intended dialogue patterns, then adjust down if you see abuse.

You can layer multiple windows — per minute, per hour, per day — to handle different abuse patterns. Spammers hit the per-minute limit. Session farmers hit the hourly. Account-level abuse hits the daily ceiling. Each limit is independently tunable without changing the others.

Expose the limits as environment variables so you can tighten or loosen them without a redeploy:

RATE_LIMIT_PER_MINUTE=10
RATE_LIMIT_PER_HOUR=100
RATE_LIMIT_PER_DAY=500

4. Player Entitlement Checks

Rate limiting stops volume abuse. Entitlement checks stop a different problem: requests that don't come from legitimate players at all.

Without entitlement verification, anyone who discovers your endpoint URL can send requests — people with pirated copies of your game, automated scripts, or tools entirely unrelated to your game. They all drain your LLM credits.

The verification approach depends on which platform your game runs on.

Steam (Steamworks API)

Steam provides encrypted app tickets that prove a player owns your game and is authenticated with Steam at the time of the request. The ticket is generated client-side and verified server-side against Valve's API — your server never trusts the client's claim directly.

In Unity, generate the ticket using Steamworks.NET:

// Unity client — generate a ticket before calling your dialogue server
using Steamworks;
 
public static async Task<byte[]> GetSteamTicket()
{
    var ticket = new byte[1024];
    var handle = SteamUser.GetAuthSessionTicket(ticket, 1024, out uint ticketLength);
    // Send the first ticketLength bytes to your server with each /chat request
    return ticket[..ticketLength];
}

On your server, verify the ticket against Steam's backend before processing any dialogue request:

async function verifySteamTicket(ticket: string, steamId: string): Promise<boolean> {
  const response = await fetch(
    `https://partner.steam-api.com/ISteamUserAuth/AuthenticateUserTicket/v1/` +
    `?key=${process.env.STEAM_PUBLISHER_KEY}` +
    `&appid=${process.env.STEAM_APP_ID}` +
    `&ticket=${ticket}`
  );
  const data = await response.json();
  const result = data?.response?.params;
 
  return (
    result?.result === "OK" &&
    result?.steamid === steamId &&
    result?.ownersteamid === steamId // Different if the game was borrowed via Family Sharing
  );
}

ownersteamid !== steamid means the player is using a Family Sharing copy. Decide whether to allow that for your AI dialogue feature — it's a legitimate play session, but it may not align with your credit model.

Epic Games Store (EOS SDK)

Epic's EOS SDK provides a similar flow through EOS_Auth_CopyUserAuthToken. The client obtains an access token post-login and sends it with each request. Your server verifies it against EOS's OAuth endpoint.

async function verifyEOSToken(accessToken: string): Promise<{ valid: boolean; accountId?: string }> {
  const response = await fetch("https://api.epicgames.dev/epic/oauth/v2/tokenInfo", {
    headers: { "Authorization": `Bearer ${accessToken}` },
  });
 
  if (!response.ok) return { valid: false };
 
  const data = await response.json();
  // Also verify data.client_id matches your EOS client ID
  // and data.deployment_id matches your deployment
  return {
    valid: data.active === true,
    accountId: data.sub,
  };
}

Console Platforms (PlayStation, Xbox, Nintendo)

Console platform authentication works on the same principle — the client obtains a platform-specific token, your server verifies it against the platform's auth API — but each platform requires a publisher/developer account and platform certification before you can access the verification endpoints.

PlayStation Network — verify using the PSN OAuth 2.0 token introspection endpoint. Requires a PSN developer account and approved application.
Xbox Live / Microsoft — verify using Xbox Live's token validation. Microsoft provides an XSTS token on the client that can be exchanged and verified server-side via the Xbox Live Auth API.
Nintendo Switch — Nintendo's NSA (Nintendo Switch Account) SDK provides tokens verifiable through Nintendo's server-side API, accessible under an NDAs-gated developer relationship.

The integration effort for each platform is similar: obtain a token on the client, send it to your server, verify against the platform's endpoint before the request reaches your LLM. The verification adds one additional HTTP call per session creation — not per message, if you cache the verified session.

Tip

Entitlement checks should gate session creation, not individual messages. Verify once when the player starts a session, issue your own short-lived token for that session, and use that token to authorize subsequent dialogue requests. This keeps per-message latency low while still verifying the player is legitimate.

5. What You Still Have to Build

After infrastructure, rate limiting, and entitlement checks, you still need content moderation, output validation, conversation history management, monitoring, and alerting. The full DIY surface is:

Component	Engineering effort
VPS setup + Docker	1–2 days
Dialogue middleware endpoint	3–5 days
Per-player rate limiting (Redis)	1 week
Player entitlement checks	1–2 weeks per platform
Input/output content moderation	2–4 weeks (ongoing tuning)
Conversation history + context	1–2 weeks
Output format enforcement	3–5 days
Monitoring, alerting, incident response	Ongoing

That's a realistic 8–10 weeks of backend work before you have something you'd trust in front of real players. If your team has the bandwidth and wants full ownership, it's the right call. If you want to ship AI characters without a backend project, keep reading.

Integrating AI Characters the Right Way with Journale

Journale is an AI character integration platform built specifically for games. It handles session authentication, request signing, rate limiting, content moderation, and output constraints — so your Unity integration stays clean and your players are covered.

1. Install the SDK

Add the Journale Unity SDK via the Package Manager using the Git URL:

https://github.com/journaleai/journale-unity-sdk.git

Or add it directly to your Packages/manifest.json:

{
  "dependencies": {
    "ai.journale.sdk": "https://github.com/journaleai/journale-unity-sdk.git"
  }
}

2. Initialize and Configure Your Character

In your scene's initialization script, call Journale.Initialize with your project ID. No API key goes in the client — Journale's session flow handles authentication without exposing backend credentials.

using Journale;
using UnityEngine;
 
public class GameInit : MonoBehaviour
{
    [SerializeField] private string projectId = "your-project-id";
 
    private async void Start()
    {
        await Journale.Initialize(projectId);
        Debug.Log("Journale ready");
    }
}

Your character's personality, backstory, tone, and constraints are defined once in the Journale dashboard — not hardcoded in the client. That means you can iterate on character voice without shipping a build.

3. Start a Conversation

Once initialized, call ChatToAi with the character name and the player's message:

using Journale;
using UnityEngine;
 
public class NpcDialogue : MonoBehaviour
{
    public async void OnPlayerMessage(string playerMessage)
    {
        var response = await Journale.ChatToAi("blacksmith", playerMessage);
 
        if (response.Success)
        {
            // Display response.Message in your dialogue UI
            Debug.Log(response.Message);
        }
        else
        {
            // Graceful fallback — moderation rejection or rate limit
            Debug.Log("Character is unavailable right now.");
        }
    }
}

Tip

The response.Success flag covers content moderation rejections and rate limit responses — handle it with a graceful in-world fallback (the character looks away, says they're busy) rather than surfacing an error to the player.

What Journale Handles Under the Hood

When ChatToAi is called, Journale's platform:

Verifies the player session — confirms the request is from an authenticated player in your project, not a spoofed or replayed request
Checks the rate limit — enforces per-player message limits on a sliding window server-side
Scans the player's input — flags or rejects messages that trigger content policy violations before they reach the LLM
Applies your character config — injects the character description, tone, and output constraints you defined in the dashboard
Calls the LLM — routes to the model with game-optimized prompt framing
Validates the output — checks the response before returning it; rejects outputs that violate content policy
Logs usage — deducts message credits, records analytics for the conversation

None of this requires backend engineering from your team. The Unity client makes one call. The platform does the rest.

Journale collapses that entire 8–10 week backend project to an SDK install and a dashboard configuration.

Giving Players AI Access Is the Right Call

Dynamic AI characters are a genuine retention lever. Players who can have real conversations with in-world characters — characters that remember context, respond to what was actually said, and stay on-tone — engage differently than players who cycle through dialogue trees.

The risk is real, but it's manageable. The mistake is treating "player safety" as a reason to not ship AI characters, when it's actually a reason to use an integration platform that handles it for you.

Get started free — 500 message credits, no credit card required. Your first character can be talking in under five minutes.