As the demand for scalable and efficient data processing has grown, traditional data architectures have struggled to keep up. Databricks has emerged as a key player in the industry, unifying data lakes and warehouses into a new structure dubbed data lakehouses. In this blog, learn more about how Databricks was founded, Databricks’ history and milestones, and its products and funding.

Databricks’ History and Milestones

The origin of Databricks is intertwined with the origin of the data processing framework Spark. Originally conceived at UC Berkeley’s AMPLab, Spark was born in 2009 out of a research project to create a cluster computing framework that could handle the sheer volume of big data workloads faster and more efficiently than the popular frameworks at that time. Matei Zaharia, a then Ph.D. student, founded Spark to address the biggest pain points with current industry standards including poor performance with iterative algorithms and interactive data mining tasks. Spark aimed to tackle these challenges with in-memory processing, ease of use, fault tolerance, and use in general purpose computing.

Databricks’ Founding Story

Zaharia founded Databricks in 2013 alongside Ali Ghodsi, Ion Stoica, Patrick Wendell, Reynold Xin, Scott Shenker, and Andy Konwinski. The team had developed relationships working together at Spark, moving the project to the Apache Software Foundation and becoming Apache Spark. Databricks was founded with the goal of filling the gaps left by Apache Spark’s community-driven model. While an open-source model for Apache Spark enabled it to evolve rapidly and achieve widespread adoptions, the community-driven model had limitations including a lack of commercial support and services, inconsistent quality of code, and complexity of deployment and management.

Data Lakehouses

Databricks’ core innovation lies within the data lakehouse structure. Previously, data companies were limited to the two options:

Data Lake – stores data in its raw, native format, offering flexibility and scalability for various data types and analysis
Data Warehouse – stores structured, process data optimized for specific business intelligence and reporting needs

Databricks recognized the limitations of each structure and introduced a lakehouse architecture. Designed to combine the best features of data lakes and data warehouses into a single platform, data lakehouses allow for the storage and analysis of diverse types of data, including structured, semi-structured, and unstructured data. In 2025, some consider the data lakehouse model to be the industry standard for storing and processing large amounts of data.

Databricks’ Products

Databricks offers a unified Lakehouse Platform engineered to combine data engineering, analytics, and AI/ML into a single cloud-based solution.

Data Intelligence Platform

Databricks main product offering is its Data Intelligence Platform. Built on lakehouse architecture, it is designed to provide an open, unified foundation for all data and governance. The Data Intelligence Platform leverages generative AI to help optimize performance and manage infrastructure. Enhanced by strong governance and security, the Data Intelligence Platform aims to make finding and discovering new data as simple as asking a coworker.

Delta Lake

Delta Lake on Databricks is designed to offer a reliable, secure, and high-performance storage layer for a data lake. It functions as a centralized repository to store raw, structured, and unstructured data at any scale for analysis and processing.

Structured Query Language (SQL)

Underneath the Data Intelligence Platform, Databricks SQL is an intelligent data warehouse designed to make analytics accessible to both technical and business users. It is engineered to adapt to the unique aspects of data and deliver top-tier performance and cost efficiency.

Mosaic AI

Databricks Mosaic AI is a tool that helps enable developers to build, deploy, evaluate, and manage artificial intelligence (AI) and machine learning (ML) solutions like predictive models and generative AI applications.

Databricks Milestones

Over the years, Databricks has reached numerous milestones in terms of traction, revenue, and awards. Some of its recent milestones are, but not limited to, the following:

Announced in June 2025 that it expects annualized revenue to reach $3.7B by July 2025, with 50% year-over-year growth[1]
In January 2025, Databricks closed a ~$15.3B Series J funding round at a $62B valuation. The company raised $10B in equity and added an additional $5.25B in debt funding from JPMorgan Chase, Barclays, Citi, Goldman Sachs, and Morgan Stanley[2]
Achieved a $3B revenue run rate in the quarter ended January 31, 2025 and is on pace to generate positive free cash flow[3]
Announced its agreement to acquire Tabular in June 2024 for $2B in an effort to strengthen and expand its open lakehouse capabilities[4]
Over 500 Databricks customers generate an annual revenue run rate exceeding $1M each[5]
Databricks is considering a potential Initial Public Offering (IPO) which could occur in 2025 or 2026[6]

Databricks Funding

Databricks has raised ~$19B in its 12-year history and was valued at $62B as of its last funding round in January 2025. Its most recent funding round brought in ~$15.3M in new funding, with $10B in equity and $5.25B in debt financing. The new capital is planned to be used to invest in new AI products, bolster its global go to market operations, and fund new acquisitions.[7]

Final Thoughts

Helping to pioneer a new standard in data processing and analytics, Databricks has become a key player in the industry. Combining the power of data lakes and data warehouses into a unified structure, Databricks has helped redefine how enterprises handle data. With its future plans for global expansion and a potential public debut on the horizon, Databricks is poised to help further shape the future of data-driven decision making.

Are you ready to invest in startups? Sign up for a MicroVentures account to start investing!

SIGN UP FOR FREE

Want to learn more about investing in startups? Check out the following MicroVentures blogs to learn more:

[1] https://www.cnbc.com/2025/06/11/databricks-says-annualized-revenue-to-reach-3point7-billion-by-next-month.html

[2] https://techcrunch.com/2025/01/22/databricks-closes-15-3b-financing-at-62b-valuation-meta-joins-as-strategic-investor/

[3] https://www.cnbc.com/2025/06/10/databricks-cnbc-disruptor-50.html

[4] https://techcrunch.com/2024/08/14/databricks-reportedly-paid-2-billion-in-tabular-acquisition/

[5] https://www.prnewswire.com/news-releases/databricks-is-raising-10b-series-j-investment-at-62b-valuation-302333822.html

[6] https://techcrunch.com/2024/12/17/its-dumb-to-ipo-this-year-databricks-ceo-explains-why-hes-waiting-to-go-public/

[7] https://techcrunch.com/2025/01/22/databricks-closes-15-3b-financing-at-62b-valuation-meta-joins-as-strategic-investor/

*****

The information presented here is for general informational purposes only and is not intended to be, nor should it be construed or used as, comprehensive offering documentation for any security, investment, tax or legal advice, a recommendation, or an offer to sell, or a solicitation of an offer to buy, an interest, directly or indirectly, in any company. Investing in both early-stage and later-stage companies carries a high degree of risk. A loss of an investor’s entire investment is possible, and no profit may be realized. Investors should be aware that these types of investments are illiquid and should anticipate holding until an exit occurs.

MicroVentures Blog

MicroVentures’ Portfolio Company: Databricks’ History and Milestones