New

How I Built A Saas That Deploys Ai Chatbots To Telegram In 2 Minutes

I recently launched ClawBotCloud, a managed platform for deploying AI chatbots to Telegram. Users configure a bot through a web dashboard, connect their Telegram token, and deploy. Each bot runs in its own isolated container on Fly.io, powered by Claude.

In this post, I want to walk through the architecture decisions, trade-offs, and lessons learned building it as a solo developer.

The Problem

Setting up an AI chatbot on Telegram requires a surprising amount of infrastructure:

A server (VPS, cloud instance, or a machine at home)
Docker or a process manager to keep the bot running
API key management and secure storage
Telegram Bot API configuration
Monitoring and crash recovery
SSL certificates if using webhooks

For a developer, this is a weekend project. For a small business owner who just wants an AI assistant in their Telegram group, it's a brick wall.

I wanted to reduce that to: configure → connect → deploy.

Architecture Overview

Here's the high-level architecture:

┌─────────────┐     ┌──────────────┐     ┌─────────────────┐  
│  Next.js     │────▶│  PostgreSQL   │     │   Fly.io        │  
│  Dashboard   │     │  (Neon)       │     │   Machines      │  
│              │────▶│  Redis        │     │                 │  
│  (Vercel)    │     │  (Upstash)    │     │  ┌───────────┐  │  
│              │─────┼───────────────┼────▶│  │  Bot #1   │  │  
│              │     │  Stripe       │     │  │  (OpenClaw)│  │  
└─────────────┘     └──────────────┘     │  └───────────┘  │  
                                          │  ┌───────────┐  │  
                                          │  │  Bot #2   │  │  
                                          │  │  (OpenClaw)│  │  
                                          │  └───────────┘  │  
                                          │  ┌───────────┐  │  
                                          │  │  Bot #N   │  │  
                                          │  │  (OpenClaw)│  │  
                                          │  └───────────┘  │  
                                          └─────────────────┘

The key insight: each bot is its own Fly.io machine. Not a shared process, not a worker in a queue — a fully isolated container.

Tech Stack Decisions

Next.js 15 (App Router) — The Dashboard

The dashboard handles user auth, bot configuration, billing, and bot management. I went with Next.js 15 and the App Router because:

Server components reduce client-side JS (the dashboard is mostly forms and tables)
Server actions simplify the API layer — no separate REST endpoints for CRUD operations
Vercel deployment is zero-config

// Example: Server action for creating a bot  
'use server'  
  
import { auth } from '@/lib/auth'  
import { db } from '@/lib/db'  
import { createFlyMachine } from '@/lib/fly'  
  
export async function createBot(formData: FormData) {  
  const session = await auth()  
  if (!session?.user) throw new Error('Unauthorized')  
  
  const bot = await db.bot.create({  
    data: {  
      name: formData.get('name') as string,  
      systemPrompt: formData.get('systemPrompt') as string,  
      userId: session.user.id,  
      status: 'provisioning',  
    },  
  })  
  
  // Provision the Fly.io machine  
  await createFlyMachine(bot.id, {  
    telegramToken: formData.get('telegramToken') as string,  
    anthropicKey: formData.get('anthropicKey') as string,  
    systemPrompt: formData.get('systemPrompt') as string,  
  })  
  
  return bot  
}

PostgreSQL (Neon) — The Database

I chose Neon's serverless PostgreSQL for a few reasons:

Scale-to-zero means I'm not paying for a database that sits idle 90% of the time at launch
Branching is useful for testing schema changes
Prisma works great with it

model Bot {  
  id            String    @id @default(cuid())  
  name          String  
  systemPrompt  String    @db.Text  
  status        BotStatus @default(STOPPED)  
  telegramToken String    @db.Text  // encrypted at application level  
  userId        String  
  user          User      @relation(fields: [userId], references: [id])  
  flyMachineId  String?  
  createdAt     DateTime  @default(now())  
  updatedAt     DateTime  @updatedAt  
  
  @@index([userId])  
}  
  
enum BotStatus {  
  PROVISIONING  
  RUNNING  
  STOPPED  
  ERROR  
}

Fly.io Machines API — Bot Infrastructure

This is the most interesting part. The Fly.io Machines API lets you create, start, stop, and destroy individual containers programmatically.

// Simplified bot provisioning  
async function createFlyMachine(  
  botId: string,  
  config: BotConfig  
): Promise<string> {  
  const response = await fetch(  
    `https://api.machines.dev/v1/apps/${FLY_APP}/machines`,  
    {  
      method: 'POST',  
      headers: {  
        Authorization: `Bearer ${FLY_API_TOKEN}`,  
        'Content-Type': 'application/json',  
      },  
      body: JSON.stringify({  
        name: `bot-${botId}`,  
        config: {  
          image: 'registry.fly.io/clawbotcloud-bot:latest',  
          env: {  
            BOT_ID: botId,  
            TELEGRAM_TOKEN: encrypt(config.telegramToken),  
            ANTHROPIC_API_KEY: encrypt(config.anthropicKey),  
            SYSTEM_PROMPT: config.systemPrompt,  
          },  
          guest: {  
            cpu_kind: 'shared',  
            cpus: 1,  
            memory_mb: 256,  
          },  
          services: [],  // No public ports — bot uses outbound only  
          checks: {  
            alive: {  
              type: 'http',  
              port: 3000,  
              path: '/health',  
              interval: '30s',  
              timeout: '5s',  
            },  
          },  
        },  
      }),  
    }  
  )  
  
  const machine = await response.json()  
  return machine.id  
}

Why Fly.io over alternatives?

Option	Pros	Cons
Fly.io Machines	Per-machine isolation, instant start/stop, pay-per-second	API can be quirky, documentation gaps
AWS ECS/Fargate	Battle-tested, scalable	Complex, expensive at low scale, slow cold starts
Railway	Great DX	Less control over individual containers
Kubernetes	Ultimate flexibility	Massive overkill for this use case
Shared process	Cheapest	No isolation, one bot crash kills all

Fly machines can be stopped and started in under 2 seconds. This is critical for economics — when a user cancels, I stop their machine instantly and stop paying for it. No zombie containers.

Encryption and Security

API keys are sensitive. Here's the approach:

import { createCipheriv, createDecipheriv, randomBytes } from 'crypto'  
  
const ALGORITHM = 'aes-256-gcm'  
const KEY = Buffer.from(process.env.ENCRYPTION_KEY!, 'hex')  
  
export function encrypt(text: string): string {  
  const iv = randomBytes(16)  
  const cipher = createCipheriv(ALGORITHM, KEY, iv)  
  
  let encrypted = cipher.update(text, 'utf8', 'hex')  
  encrypted += cipher.final('hex')  
  
  const authTag = cipher.getAuthTag()  
  
  // Store IV + auth tag + ciphertext together  
  return `${iv.toString('hex')}:${authTag.toString('hex')}:${encrypted}`  
}  
  
export function decrypt(data: string): string {  
  const [ivHex, authTagHex, encrypted] = data.split(':')  
  
  const decipher = createDecipheriv(  
    ALGORITHM,  
    KEY,  
    Buffer.from(ivHex, 'hex')  
  )  
  decipher.setAuthTag(Buffer.from(authTagHex, 'hex'))  
  
  let decrypted = decipher.update(encrypted, 'hex', 'utf8')  
  decrypted += decipher.final('utf8')  
  
  return decrypted  
}

Keys are encrypted in PostgreSQL and only decrypted when injecting into a Fly machine's environment. They're never logged, never sent to the frontend, and never stored in plain text.

Telegram Integration: Polling vs. Webhooks

There are two ways to receive Telegram messages:

Long polling: Your bot asks Telegram "any new messages?" in a loop.
Webhooks: Telegram sends messages to a URL you provide.

I went with long polling for a counterintuitive reason: it's simpler and more reliable for this use case.

With webhooks, each bot would need:

A public URL
An SSL certificate
A port exposed on the container
Fly.io proxy configuration

With long polling, the bot just makes outbound HTTP requests. No public ports, no SSL certs, no proxy config. The container doesn't even need to be publicly accessible, which is actually a security win.

// Inside the bot container — simplified polling loop  
async function startPolling(token: string) {  
  let offset = 0  
  
  while (true) {  
    try {  
      const updates = await fetch(  
        `https://api.telegram.org/bot${token}/getUpdates`,  
        {  
          method: 'POST',  
          headers: { 'Content-Type': 'application/json' },  
          body: JSON.stringify({  
            offset,  
            timeout: 30,  // long-poll for 30 seconds  
          }),  
        }  
      ).then(r => r.json())  
  
      for (const update of updates.result || []) {  
        offset = update.update_id + 1  
        await handleMessage(update)  
      }  
    } catch (error) {  
      console.error('Polling error:', error)  
      await sleep(5000)  // back off on errors  
    }  
  }  
}

The trade-off: long polling means each bot maintains a persistent connection. At scale (thousands of bots), this could be a problem. But at launch scale (tens of bots), it's the right choice. I can migrate to webhooks later if needed.

Billing with Stripe

Stripe handles subscriptions. Each bot maps to a Stripe subscription item:

// When user deploys a new bot  
async function handleBotDeployment(userId: string, botId: string) {  
  const user = await db.user.findUnique({  
    where: { id: userId },  
    include: { subscription: true },  
  })  
  
  if (!user?.subscription?.stripeSubscriptionId) {  
    // Create new subscription  
    const subscription = await stripe.subscriptions.create({  
      customer: user.stripeCustomerId,  
      items: [{ price: BOT_PRICE_ID, quantity: 1 }],  
    })  
    // Store subscription...  
  } else {  
    // Update quantity on existing subscription  
    await stripe.subscriptions.update(  
      user.subscription.stripeSubscriptionId,  
      {  
        items: [{  
          id: user.subscription.stripeItemId,  
          quantity: user.bots.filter(b => b.status === 'RUNNING').length + 1,  
        }],  
        proration_behavior: 'create_prorations',  
      }  
    )  
  }  
}

Users pay per bot. Add a bot → subscription quantity increases. Remove a bot → it decreases. Stripe handles prorations automatically.

Health Monitoring

Each bot container exposes a /health endpoint. Fly.io checks it every 30 seconds. If a bot goes unhealthy:

Fly restarts the container automatically (first line of defense)
If it fails 3 times, the dashboard shows the bot as "Error"
I get notified (webhook to my monitoring)

On top of that, the dashboard polls bot status from Fly's API periodically so users see real-time status.

Lessons Learned

1. Container-per-bot is expensive but worth it

At ~€5-7/month infrastructure cost per bot with €20/month pricing, margins are okay but not amazing. A shared-process architecture would be 10x cheaper. But the isolation guarantee is worth it — one bot with a bad system prompt can't crash another customer's bot.

2. Fly.io Machines API has quirks

The API is powerful but the documentation doesn't cover every edge case. Machine state transitions can be surprising — a machine can be in "starting" state for longer than expected, and you need to poll for the actual running state.

3. Start with fewer features

I almost built a full analytics dashboard, conversation logs, A/B testing for prompts, and multi-model support before launch. Glad I didn't. The MVP is: create bot, deploy bot, bot works. Everything else can come later.

4. Encryption is not optional

Even for an MVP. Users are trusting you with their API keys. If you store them in plain text "just for now," you'll forget to encrypt them later. Do it from day one.

What's Next

WhatsApp support — high demand, but Meta's Business API is a different beast
Free trial — figuring out limits that prevent abuse
Multiple AI models — GPT-4o, Gemini, local models
Conversation analytics — token usage, message volume, popular topics
Custom knowledge bases — upload documents for RAG