ODBC access via turbodbc/python
Azure Synapse SQL-on-Demand pools can be accessed though an odbc compatible client from python.
First you need to grant access to the sql endpoint for an external DB user:
CREATE LOGIN testuser WITH password='xxx';
SQL Analytics on-demand query reads files directly from Azure Storage. Since the storage account is an object that is external to SQL Analytics on-demand, appropriate credentials are required. A user needs the appropriate permissions granted to use the requisite credential.
Delegation of access to Azure blob storage accounts can be done with AAD pass-through or giving manual credentials
CREATE CREDENTIAL [https://blob.dfs.core.windows.net/benchmark]
WITH IDENTITY='SHARED ACCESS SIGNATURE'
, SECRET = 'sv=2018-03-28xxxx'
GO
GRANT REFERENCES ON CREDENTIAL::[https://blob.dfs.core.windows.net/benchmark] TO [testuser];
To connect to an Azure SQL-on-Demand endpoint you need to follow the installation of the ODBC driver for debian.
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
ACCEPT_EULA=Y apt-get install mssql-tools
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
Now you can connect with turbodbc to the SQL-on-demand pool to execute your queries:
import turbodbc
server ='mysynapse-ondemand.sql.azuresynapse.net'
port = 1433
database="master"
uid="testuser"
pwd="xxx"
con = turbodbc.connect(driver='ODBC Driver 17 for SQL Server',
server=server,
port=port,
database=database,
uid=uid,
pwd=pwd)
stm = '''
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://blob.dfs.core.windows.net/benchmark/*/01.parquet',
FORMAT='PARQUET'
) AS [r];
'''
cur = con.cursor()
cur.execute(stm)
print(cur.fetchall())
You can also use all the PyArrow/Pandas features in turbodbc to efficiently run workflows for data intensive machine learning applications.
cursor.execute(stm)
table = cursor.fetchallarrow()
df = table.to_pandas()
Clients
In addition to the odbc interface Azure offers a web and a desktop client to run ad hoc queries on the SQL service.
Azure Synapse Studio
Azure Synapse Studio is the integrated web client to interact with an Azure Synapse Workspace. It offers an online sql script editor and a browser for Azure Blob Storage Accounts. Based on parquet file inspection it can infer schemata and generate create external tables for parquet data in the storage accounts.
Access to the Workspace is based on the azure managed identities (AAD). Permissions can be granted to the SQL pools in the workspace. During creation of the workspace one can grant the managed identity CONTROL permissions on SQL pools.
Azure Synapse Studio offers keyword completion, syntax highlighting and some keyboard shortcuts. Run on-demand SQL queries, view and save results as CSV export.
Azure Data Studio
Azure Data Studio is a cross platform sql editor and database tool from Microsoft. It supports connecting to a Azure Synapse SQL on Demand server through the managed azure identities (AAD).
Azure Data Studio offers multiple tab windows, a rich SQL editor, IntelliSense, keyword completion, code snippets, code navigation, and source control integration (Git). It can run on-demand SQL queries, view and save results as CSV, JSON, or Excel.
from Planet Python
via read more
No comments:
Post a Comment