Voyado Engage

Delta sharing

Delta Sharing is an open protocol that lets Voyado share live datasets securely with external recipients without copying data. Recipients connect with standard tools (Spark, pandas, Power BI, Tableau, etc.) using a small credential file or OIDC federation and can query data in real time. It’s read‑only by design.

How Delta Sharing works

Delta Sharing is an open, REST‑based protocol for secure, real‑time data exchange on top of cloud storage (S3/ADLS/GCS). Providers (Voyado) expose shares that contain tables; recipients connect using a credential profile (or OIDC) with Delta‑Sharing‑compatible tools. No replication is required, the consumer just reads data from the provider’s storage through signed URLs.

There are two common modes but Voyado provides only one:

Open sharing (most common for non‑Databricks recipients). Here you send an activation link so the recipient can download a credential file (.share) or use OIDC. They then connect from Spark/pandas/BI tools.

Delta share is read‑only, meaning recipients can read or copy data out, but cannot modify the source.

Shares and Recipients

Share: A named container of assets to expose, typically one or more Delta tables in a UC metastore, organized into schemas. You grant one or more recipients access to a share (one data product per share).

Recipient: The external party (company, workspace, service principal, or app) that is allowed to read from the share. For open sharing, you generate an activation link and credentials (or OIDC config). For Databricks‑to‑Databricks, you identify the recipient by metastore ID.

Example:

  • Share: Engage Data
  • Schemas/Tables: contactslatest
  • Recipient: Marketing

Data Products in Delta Sharing

A data product is a logically grouped set of datasets (tables) delivered together to solve a business use case. We publish each data product via one share to keep ownership, lineage, policies, and SLAs clear. This improves discoverability (“what’s in the share?”) and governance.

Naming suggestions:

  • Share name: The available data product
  • Recipient name: A descriptive name to whom or what is used for. E.g helena or marketing.

Architecture

One share per data product is recommended. It keeps lifecycle/versioning simple (deprecate v1, publish v2) and keeps permissions isolated.

One recipient per share is also strongly recommended. It offers easier auditing and revocation, and avoids role sprawl. You can create multiple recipients when the same data product goes to different partners or environments.

Under the hood, connectors read through the Delta Sharing server with short‑lived links to storage; that’s how we achieve real‑time access without copying data.

Common recipient use cases

Real‑time analytics in BI: Power BI Desktop connects with the Delta Sharing connector; analysts build reports on live data and schedule refreshes in the service.

Data science / notebooks: Data scientists load tables to pandas or Spark for feature engineering or model training.

Cross‑cloud collaboration: Partners on any cloud access shared tables without us copying data across clouds.

Incremental ingestion: Consumers pull Change Data Feed (CDF) from a shared table for efficient downstream loads. (Provider must enable “history sharing”.)

Delta Sharing supports reading table changes via CDF (incremental history). Full time‑travel to a past snapshot may not be available in every connector. Where you need stable point‑in‑time datasets, copy data locally or use provider‑published snapshots. Also remember Delta history retention defaults (see §8). 

Creating Share and Recipient in Config Hub

Here are the steps to follow to create a Delta Share share and recipient 

1 - Start the Share creation

Navigate to the Delta Share section of Config Hub and select "Create share" to begin.

2. Select Data Product

Choose the specific Data Product you want to share from the available options.

3. Configure Recipient access

Enter the name of the recipient who will have access to this Share.

4. Complete Share Creation

Click "Create" to finalize the setup then wait approximately 20 minutes.

5. Activate Recipient Access

Copy the generated URL from the interface then visit the link to activate the recipient's access.

It takes about 20 minutes for the data to become available after creation.
Recipients must visit the URL provided to activate their Delta Share access.

Best practices

Here are some best practises to keep in mind:

  • Keep shares small and purposeful: one share per product and avoid “kitchen sink” shares
  • Schema contracts: version breaking changes (e.g., dp_orders_v2).
  • Select only columns you need; apply filters for predicate pushdown.
  • For Power BI, mind the row limit behavior in Power Query.
  • Incremental loads: prioritise CDF over re‑reading entire tables.
  • Security: favor OIDC federation over long‑lived bearer tokens where possible (better rotation, MFA).
  • Auditing: one recipient per share simplifies revocation and tracking.

Connecting to a share

Activation (open sharing)

  1. Provider creates recipient (open sharing) and gets activation link.
  2. Recipient opens the link to download the credential file (.share) or completes OIDC setup.
  3. Use that credential to connect from Spark, pandas, Power BI, or Tableau.

The credential file contains the endpoint URL and token; keep it secure. Many orgs now prefer OIDC to avoid managing long‑lived tokens. Here follows some consumer-side examples:

Example: Apache Spark (PySpark)

Load a data feed: 

table_url = "/path/to/profile.share#<share>.<schema>.<table>"
df = (spark.read .format("deltasharing") .load(table_url) ) # batch read
df.filter("event_date = '2025-01-01'") .select("customer_id","event_type","event_date") .show()

Change a data Feed (incremental):

cdf = (spark.read
       .format("deltasharing")
       .option("readChangeFeed", "true")
       .option("startingTimestamp", "2025-08-01 00:00:00")
       .load(table_url)
)

Example: pandas (Python)

A pandas example:

import delta_sharing as ds
profile = "/path/to/profile.share"
table_url = f"{profile}#<share>.<schema>.<table>"
sample = ds.load_as_pandas(table_url, limit=100)
pdf = ds.load_as_pandas(table_url)

Example: Power BI / Power Query (Desktop)

Import data using Power BI or Power Query:

  1. Get Data → Delta Sharing
  2. Paste in Delta Sharing Server URL and Bearer token from the credential file (or use OIDC if configured).
  3. Choose your table or tables and load (Import).

Example: Tableau

Import data using Tableau:

  • Install “Delta Sharing by Databricks” from Tableau Exchange.
  • In Tableau: Connect → Delta Sharing by Databricks → either upload the .share file or enter Endpoint URL and Bearer Token.

Example: Java

For the JVM pipelines you can use the Delta Sharing Java connector (community/labs) via Maven/SBT and point to the .share file. This is handy for embedding inside services.

Retention and history

Delta table history (metadata) typically retains 30 days. Older versions may be removed and time‑travel reads might no longer be possible after retention/VACUUM. Plan to copy or snapshot locally if you need long‑term historical access.

Delta Sharing is read‑only; recipients can always land extracts locally if they need to keep records beyond retention windows.

Connector quick links

  • Delta Sharing overview (Databricks docs)
  • Open‑source Delta Sharing repo (protocol, Python & Spark connectors, examples)
  • Read shared data with Spark / pandas / Power BI using credential files
  • Spark format("deltasharing") examples (read, CDF, streaming)
  • Power BI / Power Query – Delta Sharing connector
  • Tableau – “Delta Sharing by Databricks” connector (Tableau Exchange)
  • Delta Sharing Java connector (labs/community) 
If a partner cannot use Delta Sharing, they can sometimes read Delta format directly (e.g., Trino/Presto, DuckDB) with proper storage access controls.

Troubleshooting

  • 401/403 unauthorized: credential expired/revoked, token missing, or OIDC not configured. Regenerate activation link or confirm OIDC.
  • CDF is not enabled: request provider to enable history sharing or use full reads.
  • Power BI shows limited rows: adjust Power Query row limits and apply filters.
  • Historical versions unavailable: likely vacuumed or beyond retention; snapshot/copy locally for long‑term needs.

Code snippets

Spark SQL:

table_url = "/path/to/profile.share#<share>.<schema>.<table>"

Spark CDF window (PySpark):

changes = (spark.read
  .format("deltasharing")
  .option("readChangeFeed", "true")
  .option("startingTimestamp", "2025-07-01 00:00:00")
  .option("endingTimestamp",   "2025-07-31 23:59:59")
  .load(table_url)
)

Pandas quick peek:

import delta_sharing as ds
ds.load_as_pandas(table_url, limit=50)

New delete pattern

To clarify which data is stored and made available through Delta Share, a new delete pattern has been introduced (as of January 2026). Depending on how consumers of Delta Share have implemented their integrations, this change may require adjustments on their side.

Delta Share allows customers to incrementally retrieve changes to their data and store those changes in their own data warehouse or similar storage solution. This is achieved by storing every batch of new or modified records with a commit version tag. Consumers can then download all data associated with a specific commit version. When new or updated data is added to Delta Share, the record receives the value “insert” in the _change_type column.

The column _commit_version indicates which version the record belongs to, and _commit_timestamp shows when the record was added.

The new delete pattern introduces a _change_type value named “delete”. A delete record will be generated for any row where the import_date_time is older than 30 days.

This record does not modify the original data. All fields remain unchanged. Instead, it serves as a technical log entry indicating that the corresponding row will be removed from Delta Share.

Consumers of Delta Share may choose to ignore these delete records or treat them similarly.

FAQ

When using “query” to “fetch everything,” what does that mean?

Using the "query" option returns all data currently available in the Delta Share. In other words, the latest full snapshot of that table. It does not include all historical data, only what’s active right now.

Some Delta setups allow “time travel” to older versions of the data, but this only works if history sharing is enabled for that table.

Old versions are automatically cleaned up according to the retention policy (for Voyado, that’s 30 days).
This means you only have access to what’s currently part of the share and not every piece of data stored in Voyado.

Is there a time window to consider for “query”?

Yes. Delta Share tables follow a retention policy. In Voyado’s case, data older than 30 days is removed automatically. So, when you use a "query" you’re getting the most recent snapshot. Data outside of the retention window isn’t available.

In “changes” only certain versions are available. Why?

The "changes" option provides incremental updates, meaning only the rows that were added, modified, or deleted between table versions.

It works through Delta’s Change Data Feed (CDF), so the feature must be enabled for that table.

Each update includes metadata like:

  • _change_type

  • _commit_version

  • _commit_timestamp

The versions you can access depend on the retention window, not the number of versions or files.
Once versions are older than 30 days, they’re no longer accessible.

If you try to query a starting or ending version outside that 30-day window, no data will be returned.

What determines the versions available in the change feed?

Availability is based solely on time, not on how many versions or files exist. Once data versions pass the 30-day retention threshold, they’re cleaned up automatically. There’s no way to retrieve them afterward.

What does it mean when a table has no version history?

If a table shows no version history, it typically means:

  • Change Data Feed (CDF) or version tracking is not enabled for that share, or

  • All historical versions have been cleaned up by the retention policy.

In such cases, you’ll only have access to the latest snapshot, with no historical data or changes available.

Why am I getting “Bad request — Cannot load table version 0…”

This occurs when a full export or reset is performed in Engage. During this process, all previous table versions and transaction logs are removed as part of the retention and cleanup policy.

After the reset, a new baseline (latest/max) version becomes the valid starting point for queries. If a customer or integration tries to access version 0 or any version older than the current baseline, the system returns this error because those versions have been deleted.

The data version in the Delta Share protocol is not affected by the export itself — data continues to propagate to the latest version, ensuring ongoing updates are reflected from that new baseline.

Versions in Delta Share are subject to the default retention period of 30 days. Older versions are automatically cleaned up and are no longer available after that window.

In short, after a full export, historical versions are removed, and the data is accessible only from the latest (max) version onward. All queries must start from that version, as older ones are cleared by retention.
 

Article last reviewed

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.