2023-12-04
Databricks - key concepts
mindmap Databricks Databricks Workspace Databricks Runtime Databricks File System (DBFS) Databricks Clusters Databricks Notebooks Databricks Jobs Databricks Tables
Here are some of the key features and components of Databricks:
- Databricks Workspace
- Databricks Runtime
- Databricks File System (DBFS)
- Databricks Clusters
- Databricks Notebooks
- Databricks Jobs
- Databricks Tables
Databricks Workspace
This is the collaborative environment where you can write code, create visualizations, and share your work with others. It supports several languages including Python, SQL, R, and Scala. Read more: Create and manage your Databricks workspaces | Databricks on AWS
Databricks Runtime
This is the set of core components that run on the clusters in Databricks. It includes Apache Spark but also includes other enhancements maintained by Databricks like performance optimizations, security, and integration with other tools like Delta Lake and MLflow. Read more: What is Databricks Runtime?
Databricks File System (DBFS)
This is a distributed file system installed on Databricks clusters. It allows you to store data and share it across all nodes in a cluster. Read more: What is the Databricks File System (DBFS)?
Databricks Clusters
These are the compute resources that run your code. You can create clusters of different sizes and types depending on your workload. Read more: Compute - Azure Databricks
Databricks Notebooks
These are collaborative documents that contain code, visualizations, and text. They're great for exploratory data analysis, data science, and machine learning workflows. Read more: Introduction to Databricks notebooks
Databricks Jobs
These are the tasks or computations you run on Databricks. You can schedule jobs to run periodically, or run them on demand. Read more: Create and run Databricks Jobs
Databricks Tables
These are the structured data sources that you can query using SQL or data frame APIs in Python, R, and Scala. Read more: Delta Live Tables