Delta Sharing is an open protocol that lets Voyado share live datasets securely with external recipients without copying data. Recipients connect with standard tools (Spark, pandas, Power BI, Tableau, etc.) using a small credential file or OIDC federation and can query data in real time. It’s read‑only by design.
How Delta Sharing works
Delta Sharing is an open, REST‑based protocol for secure, real‑time data exchange on top of cloud storage (S3/ADLS/GCS). Providers (Voyado) expose shares that contain tables; recipients connect using a credential profile (or OIDC) with Delta‑Sharing‑compatible tools. No replication is required, the consumer just reads data from the provider’s storage through signed URLs.
There are two common modes but Voyado provides only one:
Open sharing (most common for non‑Databricks recipients). Here you send an activation link so the recipient can download a credential file (.share) or use OIDC. They then connect from Spark/pandas/BI tools.
Shares and Recipients
Share: A named container of assets to expose, typically one or more Delta tables in a UC metastore, organized into schemas. You grant one or more recipients access to a share (one data product per share).
Recipient: The external party (company, workspace, service principal, or app) that is allowed to read from the share. For open sharing, you generate an activation link and credentials (or OIDC config). For Databricks‑to‑Databricks, you identify the recipient by metastore ID.
Example:
- Share: Engage Data
- Schemas/Tables: contactslatest
- Recipient: Marketing
Data Products in Delta Sharing
A data product is a logically grouped set of datasets (tables) delivered together to solve a business use case. We publish each data product via one share to keep ownership, lineage, policies, and SLAs clear. This improves discoverability (“what’s in the share?”) and governance.
Naming suggestions:
- Share name: The available data product
- Recipient name: A descriptive name to whom or what is used for. E.g helena or marketing.
Architecture
One share per data product is recommended. It keeps lifecycle/versioning simple (deprecate v1, publish v2) and keeps permissions isolated.
One recipient per share is also strongly recommended. It offers easier auditing and revocation, and avoids role sprawl. You can create multiple recipients when the same data product goes to different partners or environments.
Under the hood, connectors read through the Delta Sharing server with short‑lived links to storage; that’s how we achieve real‑time access without copying data.
Common recipient use cases
Real‑time analytics in BI: Power BI Desktop connects with the Delta Sharing connector; analysts build reports on live data and schedule refreshes in the service.
Data science / notebooks: Data scientists load tables to pandas or Spark for feature engineering or model training.
Cross‑cloud collaboration: Partners on any cloud access shared tables without us copying data across clouds.
Incremental ingestion: Consumers pull Change Data Feed (CDF) from a shared table for efficient downstream loads. (Provider must enable “history sharing”.)
Creating Share and Recipient in Config Hub
Here are the steps to follow to create a Delta Share share and recipient
1 - Start the Share creation
Navigate to the Delta Share section of Config Hub and select "Create share" to begin.
2. Select Data Product
Choose the specific Data Product you want to share from the available options.
3. Configure Recipient access
Enter the name of the recipient who will have access to this Share.
4. Complete Share Creation
Click "Create" to finalize the setup then wait approximately 20 minutes.
5. Activate Recipient Access
Copy the generated URL from the interface then visit the link to activate the recipient's access.
Best practices
Here are some best practises to keep in mind:
- Keep shares small and purposeful: one share per product and avoid “kitchen sink” shares
- Schema contracts: version breaking changes (e.g., dp_orders_v2).
- Select only columns you need; apply filters for predicate pushdown.
- For Power BI, mind the row limit behavior in Power Query.
- Incremental loads: prioritise CDF over re‑reading entire tables.
- Security: favor OIDC federation over long‑lived bearer tokens where possible (better rotation, MFA).
- Auditing: one recipient per share simplifies revocation and tracking.
Connecting to a share
Activation (open sharing)
- Provider creates recipient (open sharing) and gets activation link.
- Recipient opens the link to download the credential file (.share) or completes OIDC setup.
- Use that credential to connect from Spark, pandas, Power BI, or Tableau.
The credential file contains the endpoint URL and token; keep it secure. Many orgs now prefer OIDC to avoid managing long‑lived tokens. Here follows some consumer-side examples:
Example: Apache Spark (PySpark)
Load a data feed:
table_url = "/path/to/profile.share#<share>.<schema>.<table>"
df = (spark.read .format("deltasharing") .load(table_url) ) # batch read
df.filter("event_date = '2025-01-01'") .select("customer_id","event_type","event_date") .show()
Change a data Feed (incremental):
cdf = (spark.read
.format("deltasharing")
.option("readChangeFeed", "true")
.option("startingTimestamp", "2025-08-01 00:00:00")
.load(table_url)
)Example: pandas (Python)
A pandas example:
import delta_sharing as ds
profile = "/path/to/profile.share"
table_url = f"{profile}#<share>.<schema>.<table>"
sample = ds.load_as_pandas(table_url, limit=100)
pdf = ds.load_as_pandas(table_url)Example: Power BI / Power Query (Desktop)
Import data using Power BI or Power Query:
- Get Data → Delta Sharing
- Paste in Delta Sharing Server URL and Bearer token from the credential file (or use OIDC if configured).
- Choose your table or tables and load (Import).
Example: Tableau
Import data using Tableau:
- Install “Delta Sharing by Databricks” from Tableau Exchange.
- In Tableau: Connect → Delta Sharing by Databricks → either upload the .share file or enter Endpoint URL and Bearer Token.
Example: Java
For the JVM pipelines you can use the Delta Sharing Java connector (community/labs) via Maven/SBT and point to the .share file. This is handy for embedding inside services.
Retention and history
Delta table history (metadata) typically retains 30 days. Older versions may be removed and time‑travel reads might no longer be possible after retention/VACUUM. Plan to copy or snapshot locally if you need long‑term historical access.
Delta Sharing is read‑only; recipients can always land extracts locally if they need to keep records beyond retention windows.
Connector quick links
- Delta Sharing overview (Databricks docs)
- Open‑source Delta Sharing repo (protocol, Python & Spark connectors, examples)
- Read shared data with Spark / pandas / Power BI using credential files
- Spark format("deltasharing") examples (read, CDF, streaming)
- Power BI / Power Query – Delta Sharing connector
- Tableau – “Delta Sharing by Databricks” connector (Tableau Exchange)
- Delta Sharing Java connector (labs/community)
Troubleshooting
- 401/403 unauthorized: credential expired/revoked, token missing, or OIDC not configured. Regenerate activation link or confirm OIDC.
- CDF is not enabled: request provider to enable history sharing or use full reads.
- Power BI shows limited rows: adjust Power Query row limits and apply filters.
- Historical versions unavailable: likely vacuumed or beyond retention; snapshot/copy locally for long‑term needs.
Code snippets
Spark SQL:
table_url = "/path/to/profile.share#<share>.<schema>.<table>"
Spark CDF window (PySpark):
changes = (spark.read
.format("deltasharing")
.option("readChangeFeed", "true")
.option("startingTimestamp", "2025-07-01 00:00:00")
.option("endingTimestamp", "2025-07-31 23:59:59")
.load(table_url)
)Pandas quick peek:
import delta_sharing as ds ds.load_as_pandas(table_url, limit=50)
New delete pattern
To clarify which data is stored and made available through Delta Share, a new delete pattern has been introduced (as of January 2026). Depending on how consumers of Delta Share have implemented their integrations, this change may require adjustments on their side.
Delta Share allows customers to incrementally retrieve changes to their data and store those changes in their own data warehouse or similar storage solution. This is achieved by storing every batch of new or modified records with a commit version tag. Consumers can then download all data associated with a specific commit version. When new or updated data is added to Delta Share, the record receives the value “insert” in the _change_type column.
The column _commit_version indicates which version the record belongs to, and _commit_timestamp shows when the record was added.
The new delete pattern introduces a _change_type value named “delete”. A delete record will be generated for any row where the import_date_time is older than 30 days.
This record does not modify the original data. All fields remain unchanged. Instead, it serves as a technical log entry indicating that the corresponding row will be removed from Delta Share.
Consumers of Delta Share may choose to ignore these delete records or treat them similarly.
FAQ
When using “query” to “fetch everything,” what does that mean?
Using the "query" option returns all data currently available in the Delta Share. In other words, the latest full snapshot of that table. It does not include all historical data, only what’s active right now.
Some Delta setups allow “time travel” to older versions of the data, but this only works if history sharing is enabled for that table.
Old versions are automatically cleaned up according to the retention policy (for Voyado, that’s 30 days).
This means you only have access to what’s currently part of the share and not every piece of data stored in Voyado.
Is there a time window to consider for “query”?
Yes. Delta Share tables follow a retention policy. In Voyado’s case, data older than 30 days is removed automatically. So, when you use a "query" you’re getting the most recent snapshot. Data outside of the retention window isn’t available.
In “changes” only certain versions are available. Why?
The "changes" option provides incremental updates, meaning only the rows that were added, modified, or deleted between table versions.
It works through Delta’s Change Data Feed (CDF), so the feature must be enabled for that table.
Each update includes metadata like:
_change_type
_commit_version
_commit_timestamp
The versions you can access depend on the retention window, not the number of versions or files.
Once versions are older than 30 days, they’re no longer accessible.
If you try to query a starting or ending version outside that 30-day window, no data will be returned.
What determines the versions available in the change feed?
Availability is based solely on time, not on how many versions or files exist. Once data versions pass the 30-day retention threshold, they’re cleaned up automatically. There’s no way to retrieve them afterward.
What does it mean when a table has no version history?
If a table shows no version history, it typically means:
Change Data Feed (CDF) or version tracking is not enabled for that share, or
All historical versions have been cleaned up by the retention policy.
In such cases, you’ll only have access to the latest snapshot, with no historical data or changes available.
Why am I getting “Bad request — Cannot load table version 0…”
This occurs when a full export or reset is performed in Engage. During this process, all previous table versions and transaction logs are removed as part of the retention and cleanup policy.
After the reset, a new baseline (latest/max) version becomes the valid starting point for queries. If a customer or integration tries to access version 0 or any version older than the current baseline, the system returns this error because those versions have been deleted.
The data version in the Delta Share protocol is not affected by the export itself — data continues to propagate to the latest version, ensuring ongoing updates are reflected from that new baseline.
In short, after a full export, historical versions are removed, and the data is accessible only from the latest (max) version onward. All queries must start from that version, as older ones are cleared by retention.
Article last reviewed
Comments
0 comments
Please sign in to leave a comment.