Documents
A document is a single unit of content in Trove. Articles, tweets, highlights, transcripts, bookmarks, and notes are all documents. Connectors create them automatically during sync; you can also save them manually.
Document Fields
Section titled “Document Fields”| Field | Type | Description |
|---|---|---|
id | ID | Unique identifier (16-character hex string) |
externalId | String | Source system ID, used for deduplication |
title | String? | Document title |
url | String? | Original URL |
author | String? | Content author |
contentDate | DateTime? | When the content was originally created |
indexedAt | DateTime | When Trove indexed this document |
previewText | String? | First ~300 words (stored in D1 for fast access) |
fullText | String? | Complete text (lazy-loaded from R2) |
wordCount | Int? | Total word count |
contentType | String | One of: text, transcript, highlight, bookmark |
tags | [String] | User-defined tags |
metadata | JSON? | Connector-specific extra data |
Content Types
Section titled “Content Types”| Type | Description | Examples |
|---|---|---|
text | Articles, blog posts, essays | RSS feed items, Hacker News stories (default) |
transcript | Audio or video transcriptions | Podcast transcripts, meeting recordings |
highlight | Excerpts and annotations | Readwise highlights, Kindle notes |
bookmark | Saved URLs with extracted content | Browser bookmarks, saved links |
Deduplication
Section titled “Deduplication”Documents are uniquely identified by the pair (connector_id, external_id). If you sync a document with the same external_id from the same connector again, the pipeline skips it.
- Re-running a connector sync is safe. Duplicates are ignored.
- The same content from different connectors is stored separately (different connector IDs).
- Use stable, source-system identifiers as your
external_id(e.g., the RSS item GUID, the Notion page ID, the tweet ID).
Preview vs Full Text
Section titled “Preview vs Full Text”Documents are stored across two layers for performance.
previewTextis always available. It contains the first ~300 words and lives in D1 alongside other metadata. Use this in list views and search results.fullTextrequires a read from R2. Only request it when you need the complete document, for example, when a user opens a document detail view.
In GraphQL, only include fullText in your selection set when you need it. In MCP, trove_search returns snippets; use trove_get_document to read the full text.
Tags are arbitrary string arrays attached to documents. They can be:
- Set during ingestion (via the Sync API or
trove_save) - Updated after the fact via the
updateDocumentTagsmutation - Used as filters in
search,discover, and document listings
There is no predefined tag vocabulary. Use whatever taxonomy fits your knowledge base.