Connecting Python applications to Spark SQL using Pyodbc
Pyodbc is an open source Python module for connecting Python applications to data sources using an ODBC connection. Pyodbc can be used with any of your local Python applications to connect to Apache Spark via an ODBC driver and access databases and tables defined with Apache Spark SQL. In this section, we will explore how you can connect Python running on your local machine to a Databricks cluster using Pyodbc with the following steps:
- Download and install the Simba ODBC driver provided by Databricks on your local machine from here: https://databricks.com/spark/odbc-drivers-download.
- Install Pyodbc on your local machine's Python using
pip
, as shown in the following command:sudo pip install pyodbc
- Create a new Python file using a text editor of your choice and paste the following code into it:
import pyodbc odbc_conn = pyodbc.connect("Driver /Library/simba/spark/lib/libsparkodbc_sbu.dylib;" ...