Documents

A document is a single unit of content in Trove. Articles, tweets, highlights, transcripts, bookmarks, and notes are all documents. Connectors create them automatically during sync; you can also save them manually.

Document Fields

Field	Type	Description
`id`	ID	Unique identifier (16-character hex string)
`externalId`	String	Source system ID, used for deduplication
`title`	String?	Document title
`url`	String?	Original URL
`author`	String?	Content author
`contentDate`	DateTime?	When the content was originally created
`indexedAt`	DateTime	When Trove indexed this document
`previewText`	String?	First ~300 words (stored in D1 for fast access)
`fullText`	String?	Complete text (lazy-loaded from R2)
`wordCount`	Int?	Total word count
`contentType`	String	One of: `text`, `transcript`, `highlight`, `bookmark`
`tags`	[String]	User-defined tags
`metadata`	JSON?	Connector-specific extra data

Content Types

Type	Description	Examples
`text`	Articles, blog posts, essays	RSS feed items, Hacker News stories (default)
`transcript`	Audio or video transcriptions	Podcast transcripts, meeting recordings
`highlight`	Excerpts and annotations	Readwise highlights, Kindle notes
`bookmark`	Saved URLs with extracted content	Browser bookmarks, saved links

Deduplication

Documents are uniquely identified by the pair (connector_id, external_id). If you sync a document with the same external_id from the same connector again, the pipeline skips it.

Re-running a connector sync is safe. Duplicates are ignored.
The same content from different connectors is stored separately (different connector IDs).
Use stable, source-system identifiers as your external_id (e.g., the RSS item GUID, the Notion page ID, the tweet ID).

Preview vs Full Text

Documents are stored across two layers for performance.

previewText is always available. It contains the first ~300 words and lives in D1 alongside other metadata. Use this in list views and search results.
fullText requires a read from R2. Only request it when you need the complete document, for example, when a user opens a document detail view.

In GraphQL, only include fullText in your selection set when you need it. In MCP, trove_search returns snippets; use trove_get_document to read the full text.

Documents

Document Fields

Content Types

Deduplication

Preview vs Full Text

Tags