Research / Personal AI Agents: The Evolution of Human-AI Collaboration

Personal AI Agents: The Evolution of Human-AI Collaboration

March 31, 2025

Research articles are raw form dumps of explorations I've taken using AI research products. They are not thoroughly read through and checked. I use them to learn and write other content. I share them here in case others are interested.

Awesome, thank you! I’ll put together a comprehensive plan on how to build a personal productivity assistant using Next.js, Tailwind, MongoDB Atlas, and OpenAI’s Agents SDK. It will cover integrating with Google Calendar and Gmail, composing and sending Slack messages, retrieving Slack updates, designing the text interface, handling authentication, and suggestions for MongoDB schema structure. I’ll also look into how to optionally add voice interaction and deploy the app on Heroku.

I’ll circle back soon with everything laid out for you.

Building a Personal Productivity Assistant with Next.js, Tailwind, MongoDB, and OpenAI Agents

Introduction

Building your own AI-powered productivity assistant is an exciting way to streamline daily tasks. In this guide, we’ll create a web application that acts as a personal assistant – checking your Gmail for emails, managing Google Calendar events, and posting Slack updates – all through a conversational interface. We’ll use Next.js (React framework) for the web app, Tailwind CSS for styling, MongoDB Atlas for data storage, and OpenAI’s Agents SDK to give our assistant AI capabilities like tool usage and memory. The app will be for personal use (single-user), so we can keep the authentication and infrastructure simple yet secure.

Key features we’ll implement:

  • Conversational UI: A chat-style interface where you can type queries (and optionally use voice input/output) to interact with the assistant.
  • Gmail Integration: The assistant can read your emails (e.g. check inbox or search by sender) and compose/send replies on your behalf via the Gmail API.
  • Google Calendar Integration: It can fetch your schedule for today or any range (like the next two weeks) and create events with invites (scheduling meetings and emailing attendees via Calendar API).
  • Slack Integration: It can retrieve recent Slack messages or updates (e.g. unread messages or channel highlights) and send Slack messages to coworkers or channels as instructed.
  • OpenAI Agent Brain: Using OpenAI’s Agents SDK (or function-calling API), the assistant will decide when to use each tool (Gmail, Calendar, Slack) based on your request. It maintains context or memory of the conversation so follow-up questions are handled intelligently.
  • MongoDB Atlas Storage: A cloud MongoDB database will store persistent data – for example, cached email summaries, scheduled tasks or to-dos, event details, chat history for memory, etc. We will design simple schemas for emails, events, messages, and tasks.
  • Deployment on Heroku: We’ll outline how to deploy this Next.js app to Heroku, including setting up environment variables (for API keys and credentials) and ensuring all services run smoothly in production.

By the end, you’ll have a roadmap for a comprehensive personal assistant app that you can customize further. Let’s dive into the tech stack and system architecture first.

Tech Stack and Architecture

Tech Stack Overview: We are combining a modern web stack with AI and cloud services:

  • Next.js 13+ (Node.js & React): Handles the frontend UI (and server-side API routes). It provides an interactive React interface for chat, and server functions to connect to external APIs (Google, Slack, OpenAI).
  • Tailwind CSS: Enables rapid styling of the frontend with utility classes, so we can make a clean chat interface without writing lots of custom CSS.
  • OpenAI Agents SDK / API: Powers the AI assistant capabilities. The assistant is essentially a GPT-4 (or GPT-3.5) model with the ability to call custom functions (tools) for Gmail, Calendar, Slack, etc. OpenAI’s Agents SDK provides a framework for multi-step tool use and memory – allowing the AI to “decide which tool to use and when” autonomously (Mastering OpenAI’s new Agents SDK & Responses API [Part 1] - DEV Community).
  • MongoDB Atlas: A cloud-hosted MongoDB database for persisting data. We’ll use it to store things like event details, email/message logs, or conversation history that the agent might need to recall.
  • Heroku: Platform for deploying the app. Heroku will run our Node.js server (Next.js app) and we’ll configure it with the necessary build steps and environment variables (API keys, DB URI, etc.).

System Architecture: The app follows a client–server model with integrations to third-party APIs. The diagram below illustrates the high-level architecture:

(image) Architecture of the personal assistant app. The Next.js server hosts the UI and the OpenAI-powered agent. User requests go to the agent, which can call out to Gmail, Google Calendar, Slack APIs, and the MongoDB database as needed. The agent then responds back to the user.

As shown above, the Next.js app serves both the frontend and backend:

  • The browser UI (Next.js React front-end) presents a chat interface. The user can type a query or command (and even use voice input, which we'll discuss later). When the user submits a query, it’s sent via an API call (e.g. to a Next.js API route).

  • The Next.js backend (Node.js) receives the query and passes it to the OpenAI Agent. We define the agent with certain tools (functions) it can use – like check_email, send_email, get_events, add_event, get_slack_messages, send_slack_message, etc. The agent (powered by an OpenAI model via API) will analyze the user’s request and decide if a tool function needs to be called. OpenAI’s function calling workflow works by you providing a list of function schemas and the model deciding when to invoke them (Create an Agent with OpenAI Function Calling Capabilities | by Tianyi Li | TDS Archive | Medium). The Agents SDK automates this loop – it will call the function, get the result, and feed it back to the model until the task is complete (OpenAI Agents SDK). For example, if you ask "Do I have any emails from my boss today?", the agent might invoke the Gmail-check function to find those emails, then include the results in its answer to you.

  • When the agent calls a tool, the Next.js API route for that function will in turn invoke the corresponding third-party API. For instance, if the agent calls get_today_events, our code will call the Google Calendar API to fetch today’s events. Similarly, a send_slack_message tool would call Slack’s Web API to post a message. We handle these API calls using official client libraries or REST calls.

  • MongoDB Atlas is used to store any data we want to persist. The agent (or our API routes) can query MongoDB for context or store new information. For example, when the assistant fetches events or emails, we might cache them in the DB for quick lookup. We might also store a running chat history or summary in a “memory” collection so that the agent can maintain long-term context beyond the prompt limit. (The OpenAI Agents SDK also supports in-memory context, but for persistence across sessions, a database is useful.)

  • Finally, the agent sends a response back to the Next.js API, which returns it to the frontend. The browser then updates the chat UI with the assistant’s answer. If voice output is enabled, the browser could also speak the answer aloud.

Throughout this process, secure credentials are required for the external services (Google API keys/tokens, Slack tokens, OpenAI API key, MongoDB URI). We’ll keep these in environment variables (on Heroku and in a local .env file during development) and not expose them in client-side code.

Now, let’s go step by step through integrating each of the third-party services (Google Gmail/Calendar and Slack), setting up our AI agent, designing the UI, and handling data storage.

Integrating Gmail and Google Calendar (Google APIs)

Google’s APIs will let our assistant read and send emails (Gmail) and manage calendar events. Both Gmail and Calendar are part of Google’s Cloud APIs, so the setup steps are similar. We’ll need to create a Google Cloud project for our app, enable the APIs, and handle authentication (using OAuth 2.0, since we need access to a private Google account).

Google API Setup (OAuth): Start by creating a project in the Google API Console and enabling the Gmail and Calendar APIs (Node.js Gmail API Integration: Accessing Mailbox Data). You’ll configure an OAuth consent screen (since this is for personal use, you can mark it as Internal if using a Google Workspace account, or External for a regular Gmail account). Next, create OAuth 2.0 credentials (Client ID and Client Secret) for a Web Application. For a personal app, you can use localhost (for dev) and your Heroku URL (for production) as authorized redirect URIs. This OAuth setup will allow your app to request permission to access your Gmail and Calendar.

  • Tip: Since this is a single-user personal app, you can simplify by doing the OAuth consent flow once manually (e.g. using Google’s OAuth playground or a quick script) to obtain a refresh token for your account. You can then store this token securely in your app (e.g. in the database or as an env variable) and use it to refresh access tokens as needed. This way, you don’t need a full user login flow every time – the app will have offline access to your Gmail/Calendar. However, ensure the credentials are kept private.

Gmail Integration (Email)

To interact with Gmail, we use Google’s Gmail API. The Gmail API allows reading messages, searching, and sending email through REST endpoints. Google provides a Node.js client library (googleapis) that makes it easier to call these endpoints. For example, the Gmail API has methods to list messages, get message content, and send messages (Gmail API Overview - Google for Developers).

Authentication: Using the OAuth credentials from above, we’ll create an OAuth2 client in our Node.js code and set its credentials to the token we obtained. The googleapis library can then generate an authorized Gmail client. For example:

const {google} = require('googleapis');
const oauth2Client = new google.auth.OAuth2(clientId, clientSecret, redirectUri);
// ... set oauth2Client.credentials with saved tokens ...
const gmail = google.gmail({version: 'v1', auth: oauth2Client});

Reading Emails: We can now call Gmail API methods. To check emails, one useful method is gmail.users.messages.list which lists message IDs matching a query. We might use query parameters to filter emails (e.g. label:INBOX for inbox, or from:[email protected] for sender). For example:

const res = await gmail.users.messages.list({
  userId: 'me',
  q: 'label:inbox newer_than:1d'  // sample query: inbox emails from last 1 day
});
const messages = res.data.messages || [];

This returns message metadata. We would likely then call gmail.users.messages.get({ userId:'me', id: <MessageID> }) for each ID to get details like subject, snippet, and body. The Gmail API returns the email body as a MIME message or parts, which we may need to parse (perhaps retrieving the text/plain or text/html part). For our assistant, a summary (subject, sender, snippet) might be enough to tell the user. We can always fetch full body if needed for reading out or analyzing content.

Example: To list inbox messages via the API using Node.js:

gmail.users.messages.list(
  { userId: 'me', q: 'label:inbox subject:urgent' },
  (err, res) => { /* handle results */ }
);

According to a Node integration example, this will retrieve messages matching the query (e.g. subject containing “urgent”) (Node.js Gmail API Integration: Accessing Mailbox Data) (Node.js Gmail API Integration: Accessing Mailbox Data). The result gives an array of message IDs that match. You can then fetch each message’s details similarly.

Sending Emails: To have the assistant compose or send a reply, use gmail.users.messages.send. The catch is that this method expects a raw email formatted in RFC 2822 (including headers and base64-encoded body). We can create a MIME message string with To, Subject, etc., and the reply text. Google’s docs provide examples of constructing and sending an email message (Sending Email | Gmail - Google for Developers). Alternatively, we could use a package like nodemailer to generate the MIME content and then submit it via Gmail API.

For a simpler approach, if the assistant is replying to an existing email, we can use Gmail’s threading: include the References and In-Reply-To headers from the original email and set the threadId when sending, so it appears as a reply in Gmail. The API supports a threadId field in the message resource for this purpose.

Gmail API Scopes: Make sure your OAuth token has scopes for what you need. For read-only access, use https://www.googleapis.com/auth/gmail.readonly; for sending emails, you need https://www.googleapis.com/auth/gmail.send. You can also use the full gmail.modify scope if you plan to mark messages or move them. (In a personal app, read/send should suffice.)

Google Calendar Integration (Scheduling)

For the Calendar integration, we similarly use Google’s Calendar API via the googleapis library. We’ll need Calendar OAuth scope (e.g. https://www.googleapis.com/auth/calendar.events for managing events). Using the same OAuth2 client (we can add multiple scopes to the token), we create a Calendar client:

const calendar = google.calendar({version: 'v3', auth: oauth2Client});

Fetching Events (Schedule): The assistant should provide your schedule for today or upcoming days. We can use calendar.events.list to get events from the primary calendar within a date range. For example, to get today’s events: set timeMin to today 00:00 and timeMax to today 23:59, and singleEvents: true (to expand recurring events) and perhaps orderBy: 'startTime'. For the next two weeks, adjust timeMax accordingly (e.g. now + 14 days).

Example call:

const now = new Date();
const twoWeeks = new Date();
twoWeeks.setDate(now.getDate() + 14);
const res = await calendar.events.list({
  calendarId: 'primary',
  timeMin: now.toISOString(),
  timeMax: twoWeeks.toISOString(),
  singleEvents: true,
  orderBy: 'startTime'
});
const events = res.data.items;

This gives us an array of event objects with details like summary (title), start/end times, attendees, etc. The assistant could format these into a reply (e.g. “You have 3 events tomorrow: Meeting at 10am, Lunch at 12pm, …”). If the user asks for a specific event by name or date, we could search the events list by keyword or use events.get if we have an ID.

Creating Events (with Invites): Let’s enable the assistant to schedule a meeting. Using calendar.events.insert we can create a new event on the calendar. We’ll construct an event JSON with fields like summary, start/end DateTime, and an attendees list of email addresses to invite. The Calendar API can even add a Google Meet link if we specify conferenceDataVersion: 1 and set conferenceData parameters. For example:

const event = {
  summary: 'Project Sync',
  start: { dateTime: '2025-04-01T10:00:00-05:00' },
  end:   { dateTime: '2025-04-01T11:00:00-05:00' },
  attendees: [{ email: '[email protected]' }, { email: '[email protected]' }]
};
const result = await calendar.events.insert({
  calendarId: 'primary',
  resource: event,
  conferenceDataVersion: 1,  // to get Meet link
  sendUpdates: 'all'
});

The sendUpdates: 'all' flag is important – it tells Google to automatically email the invited guests about the event (I Built an Event Scheduler in NodeJs using Google Calendar API - DEV Community). This way, when our assistant schedules something, the attendees get a calendar invite in their inbox. The API response will include details of the created event (including the Meet link in hangoutLink if created).

If needed, we can also set event.reminders or other fields. But at minimum, the above is enough to create and notify. The Calendar API and Gmail API together handle the heavy lifting of invite emails and confirmations.

Google API Best Practices: In our Next.js app, we should create API route endpoints for these interactions, e.g. GET /api/emails to list emails, POST /api/emails/reply to send a reply, GET /api/events for schedule, POST /api/events to create events, etc. However, since the OpenAI agent will be driving these, we might not call them directly from the frontend; instead, the agent’s function call will invoke internal helper functions. Still, structuring the logic in standalone functions (which can be tested independently) is wise. Also, be mindful of API quotas – both Gmail and Calendar have usage limits. For personal use you’ll likely be fine, but avoid fetching too frequently. For example, instead of polling Gmail every few seconds, have the agent fetch on demand (when asked), or poll at a reasonable interval if needed (a few times an hour for new mail, etc.). For real-time push from Google, you’d have to set up webhooks which is more complex, so polling is acceptable for this use case.

Integrating Slack (Messaging)

Now, let’s integrate Slack so our assistant can read and send Slack messages. Slack has a rich API and a developer platform for creating Slack apps. For personal use, we can create a Slack app that lives in our own workspace and use it to access messages.

Setting Up a Slack App: Go to Slack’s API site and create a new app (choose "From scratch"). You’ll install this app to your workspace. In the app configuration, under OAuth & Permissions, add scopes that your app needs. Specifically: to read messages from channels, add channels:history (for public channels) or groups:history (for private channels) and channels:read (to list channels) (Retrieving messages | Slack). If you want to read direct messages (IMs), add im:history. To send messages, add chat:write (and chat:write.public if posting to public channels) (Using the Slack API to Send Messages (User DM) (with Javascript examples) | Endgrate). For sending DMs specifically, Slack treats them as IMs that also require im:write scope (Using the Slack API to Send Messages (User DM) (with Javascript examples) | Endgrate). Once scopes are set, reinstall/authorize the app in your workspace to issue the token. You’ll get an OAuth access token (typically starting with xoxb- for bot tokens). Use a bot token unless you specifically need user context; bot tokens are fine for posting as the bot and reading channels the bot is a member of. Invite the bot user to any channels you want it to access.

Using Slack Web API (Node): Slack provides a Node SDK (@slack/web-api) that makes calling their Web API methods straightforward. We can initialize a WebClient with our token:

const { WebClient } = require('@slack/web-api');
const slackClient = new WebClient(process.env.SLACK_BOT_TOKEN);

Now we can call Slack API methods.

Reading Messages (Updates): If the assistant is asked “What’s new on Slack?” it might fetch the latest messages from a specific channel or from your mentions. We can use conversations.history for this. For example, to get the last 5 messages from a channel:

const history = await slackClient.conversations.history({
  channel: channelId,
  limit: 5
});
const messages = history.messages;

This returns an array of message objects (with text, user, timestamp, etc.) (Retrieving messages | Slack). We could filter for unread messages if we track the last seen timestamp. Slack doesn’t directly provide “unread for user” via the Web API (that typically comes via their RTM or events), but as a simplification, we could just fetch recent messages from important channels or DMs. If the workspace is just you or a small team, this is manageable.

To get a specific channel’s ID, you might call conversations.list (with channels:read scope) and find by name. For DMs, Slack uses a special channel ID (the conversation ID starting with D). Alternatively, if the assistant knows which channel or user to check (maybe you encode that in the prompt or function parameter), you can store those IDs in config.

Sending Messages: To have the assistant send a Slack message (e.g. “Send a message to Bob: I’ll be late to the meeting”), use chat.postMessage. Provide the channel (this can be a user’s DM channel ID or a channel ID) and the text:

await slackClient.chat.postMessage({
  channel: targetChannel,
  text: "Hello from my assistant bot!"
});

This will post as the bot user associated with your token. If you use a user token (xoxp, with the user’s permission), you could post as the user, but using the bot identity is fine since this is your personal assistant. The Slack API method chat.postMessage is quite flexible (you can add attachments, blocks, etc., but plain text is enough here). The official docs confirm it posts a message to a channel, private group, or DM given the channel ID (chat.postMessage method | Slack). In code, for example:

const result = await slackClient.chat.postMessage({
  channel: userId,  // if sending DM to a user
  text: messageText
});

Where userId is the Slack ID of the person (Slack will route that to the DM channel automatically when using a user’s ID). In a blog example, the code looks like: await web.chat.postMessage({ channel: userId, text: message }) to send a DM (Using the Slack API to Send Messages (User DM) (with Javascript examples) | Endgrate).

Real-time vs On-Demand: We have two approaches for Slack updates: polling or event-driven. Polling (on-demand) is simpler – the agent just queries Slack when asked. Real-time would involve Slack Events API or RTM API to push messages to our app as they come. For completeness, here’s a brief on real-time: Slack’s Events API can send an HTTP POST to your server for every new message, but you’d need to set up a public endpoint and verify requests (doable on Next.js API routes). Alternatively, Slack’s RTM or Socket Mode allows a WebSocket connection from your server to Slack to receive events. These are more complex to set up and overkill for a single-user assistant. Since “polling for updates is acceptable” per requirements, we can stick to checking Slack when prompted or maybe periodic checks (e.g. the agent can have a “check slack every hour” task if you implement it).

Thus, our Slack integration in the agent might consist of tools like get_slack_updates(channel) which uses conversations.history, and send_slack_message(target, text) which uses chat.postMessage. With the Slack SDK and a valid token, this is straightforward.

Slack App Security: Keep the token safe (don’t expose it client-side). Also be aware that the token grants a lot of access (depending on scopes). In a personal app, that’s fine, but if you ever expanded this, you’d implement proper OAuth for Slack too. For now, configuring the token in Heroku config or an .env file is sufficient.

Implementing the AI Assistant with OpenAI’s Agents SDK

With our integration functions for Gmail, Calendar, and Slack ready, the next step is to wire up the AI brain. We want an AI agent that can understand natural language commands and decide which actions to take (call email API, get calendar info, etc.), then respond in natural language with the results. OpenAI’s Agents SDK (and the underlying function calling feature of GPT-4/3.5) is perfect for this.

Choosing a Model: GPT-4 with function calling ability (gpt-4-0613 or later) will provide the best reasoning and reliability, though GPT-3.5 (gpt-3.5-turbo-0613) can also work for simpler tasks. The Agents SDK abstracts some details, but fundamentally it uses these OpenAI models under the hood.

Defining Tools (Functions): Each capability of our assistant is exposed to the model as a function with a name, description, and parameter schema. For example: a function check_email(query) might take a query string and return a list of emails; send_email(to, subject, body, threadId) to send an email; get_events(range) to get calendar events; create_event(details) to add a calendar event; get_slack_messages(channel); send_slack_message(channel, message). We define what each function does in code (as we outlined in the integration sections). The Agents SDK (in Python) or the OpenAI Node SDK will handle calling these when the model decides to use them.

OpenAI’s function calling flow works like this:

  1. We send the user’s prompt and the list of function specifications to the model.
  2. The model can return either a direct answer or a special message indicating it wants to call a function (with arguments).
  3. If a function call is requested, our code executes that function (e.g. actually fetch emails) and gets the result.
  4. We then send the result back to the model (as input) so it can use that information and produce a final answer (Create an Agent with OpenAI Function Calling Capabilities | by Tianyi Li | TDS Archive | Medium).

The OpenAI Agents SDK in Python simplifies this into an agent loop – it will loop through calling tools until the task is done (OpenAI Agents SDK). For instance, an agent query “Send Alice an invite for lunch tomorrow” might trigger a sequence: call create_event (with title “Lunch” and Alice as attendee for tomorrow noon), get result (confirmation and event link), then reply to user “I’ve scheduled lunch with Alice tomorrow at 12:00 PM and sent her an invite.” All that happens in one agent session without the user needing to say each step.

Memory and Context: By default, the conversation history can be passed in each request so the model remembers what was said. For deeper memory (like remembering tasks or information beyond the current chat), we have a few options: The Agents SDK might allow adding a memory tool – essentially a way for the agent to store/retrieve info. For example, OpenAI’s recent tools include a Key-Value store or using the database as a tool. In our app, we could implement a tool like remember_note(note) and retrieve_notes() that stores and fetches notes from MongoDB. This could be used for a “tasks” feature – e.g., user says “remind me to buy milk”, the agent calls remember_note("buy milk") which stores it in a tasks collection. Later, user asks “what did I need to buy?”, agent calls retrieve_notes() to get the list and responds. This is one way to utilize memory. Additionally, we can simply keep the last N interactions in the prompt to maintain context.

The Agents SDK’s design is to keep things simple but effective. It provides: “Agents, which are LLMs with tools; Handoffs, to delegate to sub-agents; Guardrails, to validate inputs/outputs; and an agent loop to handle calling tools and looping until completion.” (OpenAI Agents SDK) (OpenAI Agents SDK). For our case, we likely need just one agent with a set of tools. We might not need multiple agents or complex handoffs (those are more for multi-agent systems). Guardrails could be used to, say, ensure the input is not asking for something disallowed (like deleting all emails – we could forbid destructive actions unless confirmed).

Implementing the Agent (Node vs Python): The official OpenAI Agents SDK is currently Python-based. If our Next.js app is all JavaScript/TypeScript, we have two approaches:

  • Use the OpenAI API with function calling directly in Node: The OpenAI Node.js SDK supports function definitions. We can manually implement the loop: call the ChatCompletion API with the functions list, check response.function_call, execute function, then call the API again with the function result. This is a bit of boilerplate but doable. There are also community libraries like openai-agent for Node.js that mimic the Agents SDK patterns (GitHub - amzsaint/openai-agent: Connect NodeJS/Typescript Functions With OpenAI Function Call APIs!!) (GitHub - amzsaint/openai-agent: Connect NodeJS/Typescript Functions With OpenAI Function Call APIs!!), which allow you to decorate JS functions as tools and handle the loop automatically.

  • Use a Python service: Run a small Python service (using openai-agents Python SDK) alongside the Next.js app. The Node frontend could send queries to this service which runs the agent. This might be unnecessary complexity for personal use, but it’s an option if you want to leverage the Python SDK fully. Heroku can support multiple languages via buildpacks, but let’s keep it simple and stick to Node if possible.

Given simplicity, we’ll assume using the Node OpenAI SDK with function calling. Here’s a pseudo-code sketch of how a query would be handled:

import { Configuration, OpenAIApi } from "openai";
const openai = new OpenAIApi(new Configuration({ apiKey: process.env.OPENAI_API_KEY }));

// Define functions schema
const functions = [
  {
    name: "get_latest_email",
    description: "Get the latest email matching a query.",
    parameters: {
      type: "object",
      properties: { query: { type: "string", description: "Search query for email" } },
      required: ["query"]
    }
  },
  // ... other functions definitions for send_email, get_events, etc.
];

async function askAgent(userMessage) {
  const messages = [ 
    {role: "system", content: "You are a helpful assistant with access to Gmail, Calendar, Slack."}, 
    {role: "user", content: userMessage} 
  ];
  const res = await openai.createChatCompletion({
    model: "gpt-4-0613",
    messages: messages,
    functions: functions
  });
  const reply = res.data.choices[0].message;
  if (reply.function_call) {
    // The model wants to call a function
    const { name, arguments } = reply.function_call;
    let funcResult;
    try {
      funcResult = await executeFunctionByName(name, arguments);
    } catch (err) {
      funcResult = { error: err.message };
    }
    messages.push(reply);  // add the function call message
    messages.push({ role: "function", name, content: JSON.stringify(funcResult) });
    // Call model again with function result
    const secondRes = await openai.createChatCompletion({
      model: "gpt-4-0613",
      messages: messages
    });
    return secondRes.data.choices[0].message.content;
  } else {
    // Model responded directly without function call
    return reply.content;
  }
}

In the above pseudo-code, executeFunctionByName would be our dispatcher that calls the appropriate integration code (e.g., if name is "get_latest_email", call our Gmail helper with the provided arguments). The logic handles one function call; GPT could request multiple in a loop, but often one call is enough to get info then answer. The Agents SDK automates the looping (call, get result, call again if needed) which you could extend in Node by a loop until reply.content is final.

Agent Instructions & Prompting: The “system” message can be used to give the assistant a persona and to explain available tools. E.g.: “You are a personal productivity assistant. You have the following tools to help you: 1) Gmail tool – for reading and sending emails; 2) Calendar tool – for managing Google Calendar events; 3) Slack tool – for Slack messages. When needed, you will use these tools to get information or perform actions. Only use them if relevant to the query. Respond with a concise and helpful answer.” Giving clear instructions will guide the model to use the functions appropriately and not stray off course. If the model ever responds with something like it doesn’t have access, refine the instructions to remind it of the tools.

Testing the Agent: Before hooking up the UI, test the agent logic on the server side. For example, simulate a user request “Do I have any meetings today?” and see if it calls the calendar function and returns a reasonable answer. You might iterate on the function definitions or prompt to get the best results. Once it’s working, integrate it with the Next.js API route that receives user messages.

Building the User Interface (Text & Voice)

With back-end capabilities in place, we need a clean front-end for the user to interact with the assistant. Next.js will serve the UI. We can create a page (say, /assistant) that shows a chat-like interface:

Chat UI Design: Use Tailwind CSS to create a simple, responsive chat layout. For example: a container with a scrollable message list and an input box at the bottom. Each message (user or assistant) can be styled as a chat bubble (Tailwind can help with background colors, rounded corners, etc.). We can distinguish user vs assistant messages with alignment or color (e.g. user messages on right, assistant on left).

For the input form, you might have a textarea or input field for the message and a submit button. When the user submits, you optimistically add the user’s message to the chat list and call an API route (e.g. POST /api/ask) with the message text. Show a loading indicator while waiting for the assistant’s response. Once the response comes back, append it to the chat.

If using the Next.js App Router (React server components), you might use a form and handle actions, or simply use client-side fetch in a React component on form submit (since it’s a single-user app, a straightforward approach is fine).

Optional Voice Input/Output: To make the assistant hands-free, we can add microphone input and speaker output in the browser:

  • Voice Input: The Web Speech API provides SpeechRecognition in supported browsers. You can create a SpeechRecognition instance in JavaScript to start listening to the microphone and convert speech to text (How to Build an AI Voice Translator in Next.js with Web Speech API & OpenAI - Space Jelly). For example:

    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
    const recognition = new SpeechRecognition();
    recognition.lang = 'en-US';
    recognition.onresult = (event) => {
      const transcript = event.results[0][0].transcript;
      // use the transcribed text as the query
    };
    recognition.start();
    

    You might have a mic button in the UI; when clicked, trigger recognition.start() and perhaps visually indicate recording. When the user stops speaking, the onresult fires with the recognized text (How to Build an AI Voice Translator in Next.js with Web Speech API & OpenAI - Space Jelly). You can then populate the input field with that text or send it directly to the assistant. Keep in mind, browser speech recognition might require HTTPS (which Heroku provides) and user gesture to start (the mic button click suffices).

  • Voice Output: The speechSynthesis API can speak text using the device’s text-to-speech voices. After receiving the assistant’s response text, you can do:

    const utterance = new SpeechSynthesisUtterance(responseText);
    speechSynthesis.speak(utterance);
    

    This will read the response aloud. You can choose a voice or adjust rate/pitch if desired. The Space Jelly example project demonstrates using SpeechRecognition for speech-to-text and SpeechSynthesis for text-to-speech in a Next.js app (How to Build an AI Voice Translator in Next.js with Web Speech API & OpenAI - Space Jelly) (How to Build an AI Voice Translator in Next.js with Web Speech API & OpenAI - Space Jelly).

Adding voice is optional, but it’s a great usability boost for a personal assistant (you could talk to it like a smart speaker). Just ensure you also preserve regular text input/output for reliability and for environments where voice isn’t suitable.

Frontend Code Example (Next.js): Without going into full detail, here’s an outline of a React component for the chat interface (using functional components and hooks):

export default function AssistantPage() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState("");
  const [loading, setLoading] = useState(false);

  const sendMessage = async (text) => {
    const userMsg = { sender: 'user', text };
    setMessages([...messages, userMsg]);
    setLoading(true);
    const res = await fetch('/api/ask', {
      method: 'POST',
      headers: {'Content-Type': 'application/json'},
      body: JSON.stringify({ message: text })
    });
    const data = await res.json();
    setLoading(false);
    if (data.reply) {
      const botMsg = { sender: 'assistant', text: data.reply };
      setMessages(prev => [...prev, botMsg]);
      // optional: voice output
      const utterance = new SpeechSynthesisUtterance(data.reply);
      speechSynthesis.speak(utterance);
    }
  };

  const handleSubmit = (e) => {
    e.preventDefault();
    if (!input.trim()) return;
    sendMessage(input.trim());
    setInput("");
  };

  // optional: function to start voice recognition
  const startListening = () => { /* use SpeechRecognition as above */ };

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((msg, idx) => (
          <div key={idx} className={`my-2 ${msg.sender==='user' ? 'text-right' : 'text-left'}`}>
            <span className={`inline-block px-3 py-2 rounded-lg ${msg.sender==='user' ? 'bg-blue-500 text-white' : 'bg-gray-200 text-black'}`}>
              {msg.text}
            </span>
          </div>
        ))}
        {loading && <div className="text-gray-500">Assistant is typing...</div>}
      </div>
      <form onSubmit={handleSubmit} className="p-4 border-t flex">
        <button type="button" onClick={startListening} className="mr-2">🎤</button>
        <input 
          value={input} 
          onChange={e => setInput(e.target.value)} 
          className="flex-1 border rounded px-3 py-2" 
          placeholder="Ask me something..." 
        />
        <button type="submit" className="ml-2 bg-blue-600 text-white px-4 py-2 rounded">Send</button>
      </form>
    </div>
  );
}

This simplistic example shows the general idea: manage an array of messages, display them, and handle form submission to send new messages to an API. The api/ask route would be implemented to call our askAgent logic (from earlier) and return the assistant’s reply in JSON. Tailwind classes are used to style the messages and layout. You can customize the styling to your taste (e.g., different colors or avatars for user vs assistant).

Securing the UI: Since this is a personal app, you might not need a full auth system, but you should ensure not just anyone on the internet can access it (especially since it has access to your private data!). At minimum, protect the app with a password. For example, you could implement Basic Auth on the server – one quick method is to use Next.js API routes middleware to require a password from an env var (not super secure, but acceptable for personal use on Heroku), or use a single-user login with NextAuth using Google (which would conveniently ensure only your Google account can login). A simpler approach: enable HTTP Basic Auth on the Heroku site (e.g., using Heroku’s basic auth plugin or via a tiny middleware). The key is to ensure the site and its APIs are not publicly accessible without your permission. Since no one else should use this app, a simple .env based check (like an ADMIN_PASSWORD) is a quick solution.

Now that the interface is done, let’s cover how we store data and then deploy everything.

Data Storage and Schema Design (MongoDB Atlas)

Using MongoDB Atlas, we can store various pieces of data that our assistant might need to persist. Given our features, we anticipate these main types of data: emails, calendar events, Slack messages, and tasks/notes. We’ll design collections for each. Additionally, we might have a collection for user info or credentials (though often we keep credentials in env vars, not DB). If we implement conversation memory persistence, we could have a conversations collection to log past interactions and summaries.

Let’s propose a simple schema for each relevant collection (using a JSON-like notation for fields):

  • Emails Collection (emails): Stores email metadata (and maybe content) for emails the assistant has fetched. This could cache recent emails so the agent can quickly access them without calling Gmail API every time (useful if you want to, say, allow searching through an index). Fields: _id (Mongo ID), gmailId (ID of the email in Gmail), threadId, from, to (array of recipients), subject, snippet (short preview), body (full text or HTML, if we decide to store it), date (received date), isRead. Example document:

    {
      "_id": ObjectId("..."),
      "gmailId": "17c7d92fe...",
      "threadId": "17c7d92fe...",
      "from": "Boss <[email protected]>",
      "to": ["Me <[email protected]>"],
      "subject": "Project Update",
      "snippet": "Hi, here is the update on ...",
      "date": ISODate("2025-03-31T14:30:00Z"),
      "body": "<p>Hi, here is the update...</p>",
      "isRead": false
    }
    

    We might not pre-populate this collection initially; instead, when the agent fetches emails, we can upsert documents here. This way, the assistant could even answer follow-ups like “show me that email again” without calling Gmail API twice. It’s optional caching.

  • Events Collection (events): Stores calendar events (particularly those created by the assistant, or upcoming events for quick reference). Fields: _id, eventId (Google Calendar event ID), summary (title), start (Date), end (Date), attendees (list of emails), meetLink (if any), createdAt. Example:

    {
      "_id": ObjectId("..."),
      "eventId": "qwertyuiop",
      "summary": "Lunch with Alice",
      "start": ISODate("2025-04-01T12:00:00Z"),
      "end": ISODate("2025-04-01T13:00:00Z"),
      "attendees": ["[email protected]"],
      "meetLink": "https://meet.google.com/xyz-abc",
      "createdAt": ISODate("2025-03-31T10:00:00Z")
    }
    

    The assistant could use this to confirm what it scheduled or to avoid duplicating events. We could also store all events for a day when fetched, but that might be redundant with the Google Calendar API.

  • Messages Collection (messages): Could serve two purposes: storing Slack messages and storing chat conversations. We might separate these: e.g. a slack_messages collection and a chats collection. For Slack messages (if we choose to store them), fields might include Slack ts (timestamp ID), channel, user (sender), text, and time. Example:

    {
      "_id": ObjectId("..."),
      "ts": "1680297210.324800",
      "channel": "C024BE91L", 
      "user": "U023BECGF", 
      "text": "Hey, did you see my email?",
      "time": ISODate("2025-03-31T15:20:10Z")
    }
    

    We could upsert messages when the assistant fetches Slack updates, to keep a local record. If the assistant is asked later about Slack, it might reference this instead of calling Slack again for the same data.
    For conversation history (between user and assistant), a chats collection with each message, or a sessions collection with logs could be useful. For instance, we store each Q&A with a timestamp. This could help if we want to review what the agent did or debug its actions. It could also feed a long-term memory (e.g., use past conversation data as fine-tuning or context later). This is not strictly required, but good for record-keeping.

  • Tasks/Notes Collection (tasks or notes): If we implement the idea of the assistant managing to-do items or notes, we use this. Fields: description, status (pending/done), createdAt, completedAt (if done), maybe due (for tasks with deadlines). Example:

    {
      "_id": ObjectId("..."),
      "description": "Buy milk",
      "status": "pending",
      "createdAt": ISODate("2025-03-31T09:00:00Z"),
      "due": null
    }
    

    The assistant can add to this when you say reminders, and mark them done when you complete them. This essentially becomes a personal to-do list the assistant can read/write.

MongoDB Setup in Next.js: We will connect to MongoDB Atlas by using the connection string (Mongo URI). In a Next.js API route (or an external utility file), use a MongoDB driver or an ORM like Mongoose. For example, with Mongoose:

// lib/mongo.js
import mongoose from 'mongoose';
const MONGO_URI = process.env.MONGO_URI;
let conn = null;
export async function connectToDB() {
  if (conn == null) {
    conn = await mongoose.connect(MONGO_URI);
  }
  return conn;
}

We can define Mongoose schemas for the collections as per above fields. However, given the simple nature, using the official MongoDB Node driver with direct CRUD operations is also fine. Ensure to handle connecting once (Next.js might re-run the file on hot reload, etc., so guard against duplicate connections).

Security & Storage Notes: Since it’s your data, ensure the MongoDB Atlas cluster has proper network rules (e.g., IP whitelist or at least good password). You’ll put the Mongo URI (with username/password) in Heroku config. Also, consider the volume of data – likely minimal (some emails, events, messages). A free tier Atlas should handle this without issues. If you store email bodies or file attachments (not in our plan, but if you did), watch out for size.

One nice thing about storing data is that the assistant could answer questions like “Show me that email from last week about the project” even if Gmail API is slow or unreachable, because it’s in the DB. However, that means data duplication and potential staleness. For our use, we might lean on live API calls for real-time info and use the DB more for the assistant’s own memory (notes, chat history, tasks).

Deployment on Heroku

Deploying a Next.js app with a Node backend to Heroku is straightforward. Here are the steps and considerations for a smooth deployment:

  1. Heroku App Setup: Create a new Heroku app via the Heroku Dashboard or CLI (heroku create your-app-name). Make sure to set the stack to container or heroku-20+ which supports Node 18+, etc. (Heroku does this by default usually).

  2. Environment Variables: In your Heroku app’s settings, set the config vars for all the secrets and configs:

    • OPENAI_API_KEY : Your OpenAI API key.
    • GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET: From Google Cloud (if your app code needs them for token refresh). Also GOOGLE_REFRESH_TOKEN if you obtained one for offline access.
    • SLACK_BOT_TOKEN: The Slack app token.
    • MONGO_URI: Your MongoDB Atlas connection string (make sure to include username, password, and database name).
    • Any other custom settings (e.g., ADMIN_PASSWORD if you implement basic auth, etc.).

    Heroku will expose these to your Node app as process.env.VAR_NAME. Ensure in Next.js you use environment variables correctly – if using Next’s built-in features, prefix client-side needed vars with NEXT_PUBLIC_, but in our case most secrets are for server-side use only (API keys), so keep them private.

  3. Build and Start Script: In your package.json, the default Next.js setup has "build": "next build" and "start": "next start". Heroku’s Node buildpack will run npm install (or yarn) and then run the build script. For the runtime, Heroku by default will run npm start (unless a specific Procfile is present). Ensure "start" script is set to next start which starts the Next.js server. Alternatively, you can use a Procfile with web: npm run start to be explicit.

  4. MongoDB Connection on Heroku: By default, MongoDB Atlas is external to Heroku. Make sure your Atlas cluster is set to allow connections from anywhere (0.0.0.0/0) or at least from Heroku’s IP range (which is not fixed, so 0.0.0.0/0 might be needed for personal use, though that is less secure). Use a strong password since the IP rule is open. Alternatively, if you want to avoid open access, deploy a MongoDB Atlas peer or use an addon – but Atlas is fine for our needs. Test the connection string locally first to verify it works.

  5. Testing on Heroku: Once you push the code (git push heroku main if using Heroku Git deploy, or use GitHub integration), watch the logs (heroku logs -t) for any errors on startup. Common issues might be:

    • Missing environment variables (the app might crash if, say, MONGO_URI isn’t set – so double-check).
    • Build errors (if any dependency issues). Next.js might need a newer Node version; you can specify an engine in package.json like "node": "18.x" to ensure Heroku uses Node 18.
    • If using any binary libraries for speech or otherwise (not likely in our stack), ensure they are installed properly.
  6. Scaling Considerations: Heroku free tier (if still available) might sleep the dyno, which is fine for personal use but means first request might be slow to wake. If on Hobby tier, no sleep issues. The app should not require multiple dynos unless you expect heavy use. One web dyno is enough. The memory usage of GPT responses is trivial compared to the model running on OpenAI’s servers (we’re just waiting on network calls).

  7. Domains and HTTPS: Heroku provides a default domain (your-app.herokuapp.com) with HTTPS. You can use that for accessing your assistant. If you want a custom domain, you can configure it in Heroku and add SSL (Heroku handles Let’s Encrypt certs easily). For using the microphone via Web Speech, ensure you use HTTPS (which Heroku’s domain does).

  8. Logging and Monitoring: For debugging, you might want to log certain things on the server (like when a function is called by the agent, what it did, etc.). Use console.log for simplicity. Heroku aggregates logs, which you can view with the CLI. Just be careful not to log sensitive info (like the content of emails) in a way that persists, unless it’s just for your knowledge. Remove or secure logs in production as needed.

  9. Handling Secrets: Never commit your API keys in the repository. We rely on Heroku config vars for those. In Next.js, don’t expose them to client-side code. Our architecture keeps all secrets usage in API routes (server side), which is correct.

Post-Deployment: Once deployed, test the live app. Try asking the assistant to check an email or schedule an event and see if it works end-to-end. You might need to adjust the OAuth consent (Google might warn “unverified app” if not published – for personal use, that’s okay, just grant permission). Slack might also require the app to be running at least once to complete the OAuth flow (if you haven’t installed it properly). After using, check your Gmail, Calendar, Slack to confirm actions (like did the email send? was the event created and invite sent? did the Slack message post?). Tweak as necessary.

Finally, as the app is for your personal productivity, you can continue extending it – perhaps add more tools (maybe Google Drive summaries, or Notion integration for notes, etc.). The combination of Next.js + OpenAI function-calling agent + cloud APIs is powerful, allowing the assistant to act on your behalf in many domains.

Conclusion

We’ve assembled a comprehensive plan for a personal productivity assistant web app. The solution brings together a modern web frontend with powerful AI and integrates deeply with everyday tools (email, calendar, chat). By using OpenAI’s agent capabilities, the assistant can autonomously decide how to fulfill your requests – performing multi-step actions like looking up information and taking action (sending messages or scheduling events) all in one go.

This guide covered the tech stack and architecture, how to connect with Gmail, Google Calendar, and Slack APIs, how to harness the OpenAI Agents SDK for tool usage and memory, how to build a user-friendly (and voice-enabled) interface, and how to deploy and maintain the app on Heroku. Along the way, we outlined important code snippets and best practices (like OAuth setup, API scopes, and database schema designs for storing the assistant’s data).

With these building blocks, you can implement your own AI assistant and tailor it to your workflow. The result is a Next.js application that feels like having your very own ChatGPT-powered executive assistant: one who can read your emails, manage your schedule, send messages, and keep track of notes – all through a simple chat interface. Enjoy your new productivity boost, and feel free to expand the assistant’s skillset as you see fit!

Sources: