Building a Custom Connector

You can connect any data source to Trove by building a local connector. Your code fetches data from the source and pushes it to Trove via the Sync API.

When to Build a Custom Connector

Build a custom connector when you have a data source Trove doesn’t support yet (e.g., Notion, Slack, email, local files, a proprietary database).

Choose Your Execution Mode

Local (recommended for custom connectors). Your code fetches data and pushes it to Trove. You control when and how data is fetched.
Cloud. Trove runs the connector on a schedule. Only available for built-in connector types.

Step-by-Step Walkthrough

1. Create the Connector

curl -X POST https://api.ontrove.sh/graphql \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "mutation { createConnector(input: { connectorType: \"notion\", name: \"My Notion\", execution: LOCAL, config: {} }) { id name status } }"
  }'

Save the returned connector id. You need it for every sync request.

2. Fetch Data from Your Source

Write code that reads from your data source. Use the source’s API, read files, scrape pages, whatever works. Trove doesn’t care how you get the data, only how you send it.

3. Push Documents to Trove

Send documents to the Sync API in batches of up to 50:

curl -X POST https://api.ontrove.sh/api/sync/documents \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "connector_id": "YOUR_CONNECTOR_ID",
    "documents": [
      {
        "external_id": "notion-page-abc123",
        "title": "Meeting Notes - Q1 Planning",
        "text": "Full text of the Notion page...",
        "author": "Jane Smith",
        "date": "2026-03-20T10:00:00Z",
        "content_type": "text",
        "tags": ["meetings", "planning"]
      }
    ]
  }'

Each document in the batch accepts these fields:

Field	Required	Description
`external_id`	Yes	Stable identifier from the source system (used for dedup)
`title`	No	Document title
`text`	No*	Full text content
`url`	No	Original URL
`author`	No	Content author
`date`	No	Original creation date (ISO 8601)
`content_type`	No	`text`, `transcript`, `highlight`, or `bookmark` (default: `text`)
`tags`	No	Array of string tags
`metadata`	No	Arbitrary JSON for connector-specific data

*At least text or audio_url must be provided. If you provide audio_url, Trove queues a transcription workflow.

4. Handle Pagination

For large datasets, send documents in batches of 50. Use the cursor field to track your sync position:

{
  "connector_id": "YOUR_CONNECTOR_ID",
  "cursor": "page-3-of-10",
  "documents": [...]
}

The cursor is stored on the connector and returned in the response. On the next run, read the cursor to know where you left off.

5. Handle Deduplication

Use stable external_id values (e.g., the Notion page ID, the Slack message timestamp, the email message ID). If you sync the same external_id from the same connector again, Trove skips it automatically. Re-running your sync is safe and idempotent.

Example: Notion Connector (TypeScript)

async function syncNotion(connectorId: string, token: string) {
  const pages = await fetchNotionPages(); // Your Notion API calls

  const documents = pages.map(page => ({
    external_id: page.id,
    title: page.title,
    text: page.content,
    author: page.author,
    date: page.lastEdited,
    content_type: 'text' as const,
    tags: page.tags,
  }));

  // Send in batches of 50
  for (let i = 0; i < documents.length; i += 50) {
    const batch = documents.slice(i, i + 50);
    const response = await fetch('https://api.ontrove.sh/api/sync/documents', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${token}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        connector_id: connectorId,
        documents: batch,
      }),
    });

    const result = await response.json();
    console.log(`Indexed: ${result.documentsIndexed}, Skipped: ${result.documentsSkipped}`);

    if (result.errors.length > 0) {
      console.error('Errors:', result.errors);
    }
  }
}

Response Format

The Sync API returns:

{
  "documentsIndexed": 12,
  "documentsSkipped": 3,
  "errors": [
    { "external_id": "page-xyz", "error": "Text content is empty" }
  ],
  "cursor": "page-3-of-10"
}

documentsIndexed. Number of new documents successfully ingested.
documentsSkipped. Number of documents skipped due to deduplication.
errors. Array of per-document errors with the external_id that failed and the reason.
cursor. Echoed back if you provided one.

Incremental Sync

For efficiency, only sync new or updated content on subsequent runs.

Store the cursor value between runs (in a file, database, or environment variable).
On the next run, use the cursor to fetch only content created or modified since the last sync.
How you implement “fetch since cursor” depends on the source API. Some support updated_after parameters; others require you to track timestamps yourself.