Databricks IO Cache
- The Databricks IO cache accelerates data reads by creating copies of remote files in nodes’ local storage using fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote location. Successive reads of the same data are then executed locally, which results in significantly improved reading speed.
- The Databricks IO cache supports reading Parquet files from DBFS, Amazon S3, HDFS, Azure Blob Storage, Azure Data Lake Storage Gen1, and Azure Data Lake Storage Gen2 (on Databricks Runtime 5.1 and above). It does not support other storage formats such as CSV, JSON, and ORC.