2013 – 2026 · Private · HQ 160 Spear St, San Francisco
Databricks Products — the Data Intelligence Platform
Databricks's product line is widely flattened to “Apache Spark in the cloud,” but the company sells one platform — which it brands the Data Intelligence Platform — organized as a stack of layers on a lakehouse engine. The engine runs on AWS, Azure, and Google Cloud and combines data-warehouse reliability with data-lake economics. On top of that engine sit an open lakehouse-storage layer (Delta Lake and Apache Iceberg, side by side), Unity Catalog as the unified governance and sharing layer, the data-engineering and SQL-analytics layer (Lakeflow, Databricks SQL, AI/BI), the Mosaic AI stack at the top, and an operational layer (Databricks Apps, Lakebase, Genie, Lakewatch) for the agentic-app era. Sourced from databricks.com/product, docs.databricks.com, and acquisition press releases.
Jump to a layer: The engine · Lakehouse storage · Unity Catalog & sharing · Data engineering & SQL · Mosaic AI · Apps & operational.
Roster row: Databricks on /orgs/ · Closest competitor on the site: Snowflake Products.
How the layers stack on one engine
Read the diagram bottom-to-top. Data flows from the customer's existing systems into the engine — a multi-cloud lakehouse substrate whose defining choice is that one platform serves data-warehouse, data-lake, ML, and AI workloads against the same storage. The lakehouse-storage layer keeps the data in open table formats (Delta Lake and Apache Iceberg, side by side). The Unity Catalog layer unifies governance and sharing across all of it. The data-engineering & SQL layer and the Mosaic AI layer run side by side on that substrate, and an apps & operational layer at the top serves the agentic-app era (Databricks Apps for the UI, Lakebase Postgres for OLTP state, Genie for the conversational surface, Lakewatch for security).
Diagram caption: the engine at the base is a fully managed, multi-cloud lakehouse platform (AWS, Azure, Google Cloud) whose defining architectural choice is that warehouse, lake, ML, and AI workloads run against one set of governed data. On top of the engine: the lakehouse-storage layer (Delta Lake and Apache Iceberg open table formats, managed and foreign tables, Predictive Optimization, Liquid Clustering); Unity Catalog & sharing (unified governance, Delta Sharing, Marketplace, Clean Rooms, Lakehouse Federation); the data engineering & SQL layer (Lakeflow, Databricks SQL on Photon, AI/BI Dashboards) and the Mosaic AI layer (Agent Bricks, Vector Search, Unity AI Gateway, Agent Framework + Evaluation, Model Serving, Model Training, Managed MLflow) running side by side; and at the top, the apps & operational layer (Databricks Apps, Lakebase Postgres, Genie, Lakewatch) plus the conversational Databricks Assistant / Genie surface.
The engine · Multi-cloud · Lakehouse architecture
The core platform — the Data Intelligence Platform
What it is
The base of everything is a fully managed, cloud-native platform that runs on the three major public clouds — AWS, Microsoft Azure, and Google Cloud. Its defining architectural choice is the lakehouse: one platform that combines the reliability and performance of a data warehouse with the openness and economics of a data lake, so warehouse, lake, ML, and AI workloads run against the same governed data rather than across siloed systems. Databricks brands the current platform as the Data Intelligence Platform and describes its differentiator as a Data Intelligence Engine that uses generative AI to understand the semantics of each customer's data, automatically optimizing performance, generating documentation, and powering natural-language access. Databricks frames the platform on databricks.com/product/data-intelligence-platform as “a unified platform for data, analytics and AI” — “intelligent, simple, private.”
The “Spark in the cloud → Data Intelligence Platform” repositioning
Databricks began as a hosted Apache Spark service and is still widely described that way, but the platform has expanded to cover SQL data warehousing (Databricks SQL), governance (Unity Catalog), data engineering and orchestration (Lakeflow), AI and agents (Mosaic AI, the renamed stack after the 2023 MosaicML acquisition), operational data (Lakebase, the renamed Postgres after the 2025 Neon acquisition), business-productivity workspaces (Genie), and an agentic SIEM (Lakewatch) — all on the same engine. The current corporate narrative centers the Mosaic AI layer and the agentic-app surface above it; the lakehouse underneath is what makes that promise — AI grounded in the customer's governed enterprise data — technically coherent.
When it shipped
Founded 2013 in Berkeley by the UC Berkeley AMPLab / Apache Spark team — Ali Ghodsi, Matei Zaharia, Reynold Xin, Patrick Wendell, Andy Konwinski, Ion Stoica, and Arsalan Tavakoli. Co-founder Ali Ghodsi has been CEO since 2016, succeeding co-founder Ion Stoica. Databricks remains a private company, with a $134B valuation announced at its >$4B Series L on December 16, 2025 and a reported >$4.8B revenue run-rate growing >55% year over year with positive trailing-12-month free cash flow.
How it relates to the others
Every other layer on this page runs on this engine. Lakehouse storage, Unity Catalog, the data-engineering / SQL layer, Mosaic AI, and the apps + operational layer are all capabilities of the one platform — not separate products a customer installs and integrates. That single-substrate design is what lets, for example, an Agent Bricks agent reason over a Lakeflow-curated table that another team just shared via Delta Sharing, all under one Unity Catalog policy.
Lakehouse storage & open formats
Lakehouse storage — Delta Lake & Apache Iceberg
What it is
This layer lets the same data live in an open, vendor-neutral table format — rather than only Databricks's own format — while keeping a single set of optimizations and governance over all of it. It is Databricks's answer to the lock-in concern: read and write the open format, let other engines touch the same tables, and still benefit from Predictive Optimization and Liquid Clustering. Databricks calls the current product Lakehouse Storage and frames it on databricks.com/product/lakehouse-storage as “built for open, intelligent data storage” with full data portability.
Key capabilities
Delta LakeOSS
The open-source ACID storage layer Databricks originated and donated to the Linux Foundation. Provides transactions, time travel, and schema enforcement on top of cloud object stores. Open-sourced April 2019; the Linux Foundation project lives at delta.io.
Apache IcebergOSS
The competing open lakehouse table format, now first-class inside Databricks alongside Delta. The June 2024 acquisition of Tabular (the company founded by Iceberg co-creators Ryan Blue and Daniel Weeks) brought the Iceberg expertise in-house and underpins format-agnostic storage.
Managed & foreign tables
Managed tables — Databricks-optimized Delta or Iceberg tables with the platform's storage optimizations applied automatically. Foreign tables — tables managed by external catalogs (AWS Glue, Hive Metastore, Snowflake Horizon) governed through Unity Catalog. Both surface in the docs at docs.databricks.com/aws/en/tables/managed.
Predictive Optimization
AI-driven table optimizations — clustering, compaction, statistics — based on each table's data and query patterns. Keeps tables tuned without manual maintenance; ties back to the Data Intelligence Engine framing.
Liquid Clustering
Self-tuning data layout that replaces classic Hive-style partitioning. The layout adapts as data and access patterns change — no partition redesign required when query patterns shift.
Open Unity & Iceberg REST APIs
The Unity REST and Iceberg REST Catalog APIs let other engines — Spark, Trino, Snowflake, BigQuery, DuckDB — read and write the same managed tables. Databricks's framing is “your ecosystem, your choice” (see external access docs).
How it relates to the others
Storage sits directly on the engine. Unity Catalog governs every table here (managed or foreign), Databricks SQL and Lakeflow read and write it, and Mosaic AI grounds its agents in it. The Tabular acquisition is what unlocked symmetric first-class support for Iceberg next to Delta, which puts Databricks in the unusual position of championing both formats — in contrast to the older Spark-and-Delta-only posture.
Unity Catalog & sharing · The governance plane
Unity Catalog & sharing — one governance, one sharing fabric
What it is
The unified governance layer for every asset on the platform — structured tables, unstructured files, business metrics, ML models, and AI assets — with one policy model, one lineage graph, and one discovery surface. Above governance, the sharing fabric — Delta Sharing, Marketplace, and Clean Rooms — lets organizations share live data without copying it, across regions, clouds, and even across to non-Databricks platforms. Databricks frames Unity Catalog on databricks.com/product/unity-catalog as “unified and open governance for data and AI”; it was open-sourced under Apache 2.0 in June 2024 at Data + AI Summit and the OSS project lives at unitycatalog.io.
Key capabilities
Unified data & AI catalogOSS
One catalog covering structured tables, unstructured files, ML models, and business metrics — across Delta, Iceberg, Hudi, and Parquet. The catalog itself is the open-sourced piece.
Fine-grained access controls
Row- and column-level access policies driven by data attributes and user attributes (ABAC). Automatic classification of sensitive data (e.g. PII tagging) so policies enforce automatically rather than per-asset.
Automated lineage
End-to-end, column-level lineage across pipelines, dashboards, and models — for impact analysis, debugging, and AI audits. No manual instrumentation required.
Lakehouse Federation
Query and govern data living in MySQL, PostgreSQL, Salesforce, SAP, Redshift, Snowflake, Azure SQL, BigQuery, AWS Glue, and Hive Metastore — in place, with no migration. Databricks acts as the query and governance plane over each.
Delta SharingOSS
The open protocol for live data sharing — share governed tables across organizations, clouds, and platforms with no copies. Open-sourced May 2021; the protocol lives at delta.io/sharing.
Databricks Marketplace
An open marketplace for data products, AI models, notebooks, and solution accelerators — built on Delta Sharing so listings are platform-agnostic.
Clean Rooms
Privacy-preserving collaboration on shared data and AI assets across teams, regions, and external partners. Built on Delta Sharing under Unity Catalog governance.
Business semantics & data-quality monitoring
Centralized business semantics so the same metric resolves consistently across BI, agents, and pipelines; built-in monitoring tracks freshness, completeness, and anomalies as a governance signal, not a separate tool.
How it relates to the others
Unity Catalog is the policy layer for everything above it. A masking rule, a row-access rule, or a model-access policy set here applies whether the data is queried by Databricks SQL, transformed by a Lakeflow pipeline, shared via Delta Sharing, or reasoned about by an Agent Bricks agent. Open-sourcing Unity Catalog in 2024 was the structural answer to Apache Polaris (Snowflake's competing open Iceberg catalog) — both vendors now have an open governance story, with Databricks's emphasizing “data and AI” rather than data alone.
Data engineering & SQL analytics
Data engineering & SQL analytics — Lakeflow, Databricks SQL, AI/BI
What it is
The layer for getting data into the lakehouse, transforming it, querying it, and visualizing the results. Databricks unified its ETL and orchestration stack under the Lakeflow brand at Data + AI Summit 2024 (rolling up the prior Delta Live Tables, Workflows, and the Arcion-acquired ingestion engine). On the analytics side, Databricks SQL is the serverless data warehouse on the lakehouse, powered by the proprietary Photon engine, and AI/BI provides dashboards and the natural-language Genie surface for business users.
Key capabilities
Lakeflow Connect
Managed ingestion with built-in connectors for databases (SQL Server, Oracle, MySQL, Postgres) and SaaS apps (Salesforce, Workday, ServiceNow, Google Analytics). Roots in the Arcion acquisition (announced October 2023).
Lakeflow Pipelines
Declarative ETL — SQL or Python definitions, the platform handles orchestration, retries, and incremental refresh. The successor brand to Delta Live Tables, generally available since 2024.
Lakeflow Jobs
Multitask workflow orchestration across notebooks, pipelines, SQL, and JAR/Python tasks — with dependencies, scheduling, and monitoring. The successor brand to Databricks Workflows.
Lakeflow Designer
A visual, no-code ETL builder for analysts. Lets non-engineers compose pipelines that render down to the same Lakeflow primitives engineers use.
Databricks SQL + Photon
Serverless SQL warehouse on the lakehouse, powered by Databricks's proprietary Photon vectorized C++ query engine. Databricks reports queries are 5x faster than three years ago and ETL price/performance is 9x better.
AI/BI Dashboards & Genie
AI/BI Dashboards is the low-code visualization layer; AI/BI Genie is the natural-language conversational interface for non-SQL users to explore governed data. Both ride Unity Catalog's business semantics.
Apache SparkOSS
The open-source execution engine Databricks's founders originated at UC Berkeley's AMPLab. Still the underpinning for batch, streaming, and ML workloads on the platform; the Apache project lives at spark.apache.org.
Structured Streaming & Notebooks
Near-real-time streaming with end-to-end exactly-once guarantees, integrated with Lakeflow. Collaborative Notebooks are the interactive development surface for SQL, Python, R, and Scala against the same governed data.
How it relates to the others
Lakeflow gets data into the lakehouse and curates it; Databricks SQL and AI/BI surface it to humans and BI tools (Power BI, Tableau, Looker, Sigma); Mosaic AI grounds agents in the curated tables. The same Unity Catalog policies apply throughout. Genie sits at this layer for SQL-style natural-language exploration, while Genie business productivity (in the apps layer above) is the broader workspace surface.
Mosaic AI · The AI layer · The current growth narrative
Mosaic AI — the AI & agent stack
What it is
The layer that brings model training, model serving, vector retrieval, agent orchestration, and AI governance directly onto governed lakehouse data. The pitch is the inverse of the “take your data to a model” default: the model comes to the data, inside the security perimeter, so an enterprise can build agents that reason over its private data without exporting it. This layer drives much of Databricks's recent product narrative and evolves fastest; the framing below tracks databricks.com/product/artificial-intelligence and the Mosaic AI documentation rather than any fixed snapshot.
The MosaicML origin
Much of this direction traces to Databricks's July 2023 acquisition of MosaicML — a generative-AI platform startup — for a reported ~$1.3B. The deal seeded the company's model-training and model-serving infrastructure and brought in the Mosaic Research team; the AI stack was rebranded Mosaic AI, and the research org now publishes as Databricks Mosaic Research. See the acquisition announcement at databricks.com/blog/databricks-mosaicml.
Key capabilities
Agent Bricks
The newest entry: build production-quality AI agents grounded in enterprise data, with automated synthetic-data generation, custom evaluation, and automated tuning. Databricks's current featured AI product (see Agent Bricks).
Unity AI Gateway
One place to apply governance to every LLM and MCP server an enterprise uses — commercial APIs (OpenAI, Anthropic, Google), self-hosted models, third-party agents. Guardrails, rate limits, audit trails, and lineage all flow through it.
Vector Search
A managed vector database with automatic real-time syncing to source Delta tables, so retrieval indexes stay current with the underlying data. The retrieval backbone for RAG and agentic patterns.
Agent Framework & Evaluation
The Python framework for building production agents, paired with Agent Evaluation — AI-judge grading plus human feedback, with traceable root-cause analysis when an agent regresses.
Model Serving
Unified serving for agents, GenAI models, and classical ML models — one endpoint shape, autoscaling, and Unity-Catalog-governed access. Supports both Databricks-hosted models and external APIs.
Model Training
Fine-tune open-source LLMs, pre-train custom models, or train classical ML — on the Mosaic AI training stack inherited from the MosaicML acquisition. Surfaces as Mosaic AI Training.
Managed MLflowOSS
Enterprise-grade managed version of the open-source MLflow ML-lifecycle project Databricks originated in 2018. Tracks experiments, registers models, and now also instruments GenAI evaluation and tracing.
Data quality monitoring for AI
The same Unity Catalog monitoring that watches data tables also watches AI assets — anomaly detection, drift, freshness — so a regression in upstream data surfaces as an agent-quality signal.
How it relates to the others
Mosaic AI runs on the lakehouse and honors Unity Catalog, which is Databricks's central pitch for enterprise AI: the model comes to the governed data inside the security perimeter, rather than the data being shipped to a hosted model. Agents reach back through Vector Search to retrieve grounded context from curated Lakeflow tables, write working state to Lakebase (the OLTP Postgres layer above), and are governed end-to-end by Unity Catalog policies.
Apps & operational · The agentic-app surface
Apps & operational — Databricks Apps, Lakebase, Genie, Lakewatch
What it is
The newest layer on the platform — the home for full applications, transactional state, the conversational surface, and security operations — reflecting Databricks's bet that AI agents and the apps that contain them need a transactional database, a hosting surface, and a security perimeter all sitting next to the lakehouse, not exported to a separate stack. The layer pulls together Databricks Apps (the hosting surface), Lakebase (managed Postgres for OLTP and agent state, from the May 2025 Neon acquisition), Genie (the conversational business-productivity workspace), and Lakewatch (an agentic SIEM).
Key capabilities
Databricks Apps
Serverless hosting for full applications written in popular Python web frameworks (Streamlit, Gradio, Dash, Flask) and Node frameworks, deployed next to the data with built-in Unity Catalog auth. The application surface for analyst-built and developer-built tools alike (see Databricks Apps).
Lakebase
Managed Postgres for AI agents and data apps — decoupled compute and storage with autoscaling-to-zero, instant database branching, and one-click sync with Delta tables. Originated in the May 2025 acquisition of Neon. The OLTP complement to the lakehouse's analytical engine.
Genie (business productivity)
The unified search, chat, dashboards, and apps surface for business users — what Databricks frames on databricks.com/product/genie as “business productivity”. Sits on top of the AI/BI Genie analyst surface in the data-engineering / SQL layer.
Lakewatch
An “open, agentic SIEM built for the AI era” (Databricks's framing on databricks.com/product/lakewatch) — security analytics, detection, and response on top of the lakehouse, with agents as first-class operators.
Databricks Assistant
The platform-wide AI coding assistant — generates SQL, explains code, fixes errors, and integrates with the SQL editor and notebooks. The developer-facing analog of the Genie conversational surface.
IDE integrations & Partner Connect
First-class connections to VS Code, JetBrains, and other IDEs so developers can build against the lakehouse from local environments; Partner Connect surfaces the ecosystem of fetcher / BI / ETL / observability integrations as one-click installs.
How it relates to the others
This layer is the visible surface for the rest of the platform. Databricks Apps host the UIs visitors see; Lakebase holds the transactional state those apps and Mosaic AI agents need; Genie is the conversational entry point for business users; Lakewatch operates the security model. All four sit on Unity Catalog and the lakehouse storage layer, so an Apps-hosted UI reading from a Lakebase row writing into an Agent-Bricks-evaluated flow all share the same policy plane.
Every major product, by layer
The platform's major products, grouped by layer in stack order (engine at the top of the table, apps & operational at the bottom). The “Origin” column flags whether a product grew organically or arrived through an acquisition; the “Key concept” column is the one-phrase mental model for each. The OSS marker means the product or its core protocol is open source.
| Product | Layer | Shipped / acquired | Origin | Current state | Key concept |
|---|---|---|---|---|---|
| Data Intelligence Platform | The engine | 2013–14 | Organic | GA | Lakehouse on AWS, Azure, GCP |
| Delta Lake (OSS) | Lakehouse storage | 2019 (OSS) | Organic (Linux Foundation) | GA | Open ACID storage on object stores |
| Apache Iceberg support | Lakehouse storage | 2024 | Acquisition (Tabular, 2024) | GA | Iceberg next to Delta, first-class |
| Predictive Optimization | Lakehouse storage | 2024 | Organic | GA | AI-driven table tuning |
| Unity Catalog (OSS) | Governance | 2022; OSS 2024 | Organic | GA | One governance for data + AI |
| Delta Sharing (OSS) | Sharing | 2021 (OSS) | Organic | GA | Open zero-copy live data sharing |
| Databricks Marketplace | Sharing | 2023 | Organic | GA | Open data & AI marketplace on Delta Sharing |
| Clean Rooms | Sharing | 2024 | Organic | GA | Privacy-preserving collaboration |
| Lakehouse Federation | Governance | 2023 | Organic | GA | Query & govern external warehouses in place |
| Apache Spark (OSS) | Data engineering | 2010 (OSS) | Organic (UC Berkeley AMPLab) | GA | Distributed compute & streaming engine |
| Lakeflow | Data engineering | 2024 brand | Organic + Arcion (2023) | GA | Unified ETL: Connect, Pipelines, Jobs, Designer |
| Databricks SQL + Photon | SQL analytics | 2020–21 | Organic | GA | Serverless warehouse on the lakehouse |
| AI/BI Dashboards & Genie | SQL analytics | 2024 | Organic | GA | Low-code dashboards + natural-language Genie |
| Mosaic AI Training & Serving | AI | 2023 | Acquisition (MosaicML, 2023) | GA | Train & serve models on governed data |
| Vector Search | AI | 2024 | Organic | GA | Managed vector DB with Delta sync |
| Agent Framework & Evaluation | AI | 2024 | Organic | GA | Build, evaluate & trace production agents |
| Unity AI Gateway | AI | 2024 | Organic | GA | Govern every LLM & MCP server |
| Agent Bricks | AI | 2025 | Organic | Available | Agents grounded in enterprise data |
| Managed MLflow (OSS) | AI | 2018 (OSS) | Organic | GA | ML & GenAI lifecycle management |
| Databricks Apps | Apps | 2024 | Organic | GA | Host full apps next to the data |
| Lakebase | Operational | 2025 | Acquisition (Neon, 2025) | GA | Managed Postgres for agents & apps |
| Genie (business productivity) | Apps | 2025 | Organic | Available | Unified search, chat, dashboards, apps |
| Lakewatch | Operational | 2025 | Organic | Available | Agentic SIEM on the lakehouse |
Years are sourced to Databricks's own product pages, documentation release notes, the Databricks blog and Data + AI Summit announcements, and acquisition press releases (MosaicML 2023, Arcion 2023, Tabular 2024, Neon 2025). “GA” means generally available; “Available” covers the newest launches (Agent Bricks, Genie business productivity, Lakewatch) that are shipping but where the precise GA framing is still moving release-over-release. The Mosaic AI family is renamed and expanded frequently — verify the current state against the linked primary sources. Databricks remains private; no SEC EDGAR row is shown.
Read these primary sources
Most of this page is paraphrased from the URLs below. They are the authoritative places to read what Databricks says about each product on its own pages, the canonical technical documentation, and the acquisition / open-source announcements that establish the origins of acquired and OSS-originated products.
Databricks's own product surfaces
Per-feature pages on databricks.com/product/ — the canonical source for each capability's company-side framing — plus the technical documentation and open-source index.
# Platform & product hub
https://www.databricks.com/product/data-intelligence-platform
https://www.databricks.com/product/data-lakehouse
# Lakehouse storage & governance
https://www.databricks.com/product/lakehouse-storage
https://www.databricks.com/product/delta-lake-on-databricks
https://www.databricks.com/product/unity-catalog
https://www.databricks.com/product/delta-sharing
https://www.databricks.com/product/marketplace
# Data engineering · SQL · BI
https://www.databricks.com/product/data-engineering
https://www.databricks.com/product/databricks-sql
https://www.databricks.com/product/photon
https://www.databricks.com/product/ai-bi
# Mosaic AI
https://www.databricks.com/product/artificial-intelligence
https://www.databricks.com/product/artificial-intelligence/agent-bricks
https://www.databricks.com/product/ai-gateway
https://www.databricks.com/product/machine-learning/vector-search
https://www.databricks.com/product/machine-learning/mosaic-ai-training
https://www.databricks.com/product/model-serving
# Apps & operational
https://www.databricks.com/product/databricks-apps
https://www.databricks.com/product/lakebase
https://www.databricks.com/product/genie
https://www.databricks.com/product/lakewatch
# Open source & technical docs
https://www.databricks.com/product/open-source
https://docs.databricks.com/
Open-source upstreams
The open-source projects Databricks originated or supports — canonical for the protocols, formats, and lifecycle tooling underneath the proprietary platform.
# Databricks-originated open-source projects
https://spark.apache.org/
https://delta.io/
https://delta.io/sharing/
https://mlflow.org/
https://www.unitycatalog.io/
# Other supported open formats
https://iceberg.apache.org/
Acquisition announcements & company surfaces
The origin of each acquired product, plus the company-side surfaces where product-trajectory commentary lands (Databricks is private — no SEC filings).
# Acquisitions that became products
https://www.databricks.com/blog/databricks-mosaicml
https://www.databricks.com/blog/databricks-acquires-arcion-help-customers-bring-real-time-data-every-data-and-ai-application
https://www.databricks.com/blog/databricks-tabular
https://www.databricks.com/blog/databricks-neon
# Newsroom, blog, and Data + AI Summit (the launch venue)
https://www.databricks.com/company/newsroom
https://www.databricks.com/blog
https://www.databricks.com/dataaisummit
# Mosaic Research
https://www.databricks.com/blog/category/generative-ai/mosaic-research
Sources: Databricks's own product pages and per-feature pages for each capability's company-side framing; docs.databricks.com release notes for capability detail and GA dates; the open-source page plus the Apache Spark, Delta Lake, MLflow, and Unity Catalog project sites; and acquisition / news posts on the Databricks blog for the acquired-product origins (MosaicML, Arcion, Tabular, Neon). Reporter coverage is cited under fair use (linked, not republished). Last updated May 2026.
Mungomash LLC · More org pages · Databricks on /orgs/ · Snowflake Products