Is dynamic SQL Queries supported on Azur Databricks SQL Cluster?

Jayesh Potwade 0 Reputation points
2024-05-15T19:01:11.4766667+00:00

Hello,

I'm planning to implement Dynamic SQL function to query data on Databricks table. Tables and access for the users are governed by a custom access matrix using the Unity catalog.

The problem is that in a custom matrix, there are two types of users: individual users and aggregated users. Individual users are governed through UC but for Aggregated users, they should get result from the Databricks table only if the columns selected by users with metric has a value > 10.

As it depends on user selection, the fields that user will select is dynamic and accordingly aggregate value which will be populated should be dynamic

For example:

This is my dummy table:

User's image

User Can select at one instance Gender wise values, sometimes users can select Gender,Age-Group wise Values.

In whatever select query user fires, for aggregate user, if value >10 then it should show result set.

Actual table will have more than 50+ columns, so i have to design dynamic SQL query/functions.

In actual solution, these queries will be received to Databricks SQL Cluster from PowerBI

Can someone please let me know if dynamic sql queries are supported in Databricks SQL, if no, then any alternative solution please let me know.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,990 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 80,661 Reputation points Microsoft Employee
    2024-05-16T06:01:23.99+00:00

    @Jayesh Potwade - Thanks for the question and using MS Q&A platform.

    Yes, dynamic SQL queries are supported on Azure Databricks SQL cluster. You can use the spark-sql command to execute dynamic SQL queries on Azure Databricks SQL cluster. Here is an example:

    %sql SELECT * FROM my_table WHERE column_name = '${value}'
    

    In this example, ${value} is a parameter that can be passed dynamically to the query. You can also use the spark.sql() function to execute dynamic SQL queries in Python or Scala. Here is an example:

    from pyspark.sql.functions import lit
    
    value = "some_value"
    query = "SELECT * FROM my_table WHERE column_name = '{}'".format(value)
    
    df = spark.sql(query)
    
    

    In this example, value is a parameter that can be passed dynamically to the query. The spark.sql() function is used to execute the query and return the result as a DataFrame.

    You can use the EXECUTE IMMEDIATE statement to execute dynamic SQL queries in Databricks SQL. Here is an example:

    DECLARE @query NVARCHAR(MAX)
    SET @query = 'SELECT * FROM my_table WHERE column_name = ''' + @value + ''''
    
    EXECUTE IMMEDIATE @query
    
    

    In this example, @value is a parameter that can be passed dynamically to the query. The EXECUTE IMMEDIATE statement is used to execute the query.

    Regarding your requirement for aggregated users, you can use a CASE statement to check if the selected columns have a value greater than 10. Here is an example:

    SELECT
      Gender,
      Age_Group,
      SUM(CASE WHEN Value > 10 THEN Value ELSE 0 END) AS Aggregate_Value
    FROM
      my_table
    GROUP BY
      Gender,
      Age_Group
    
    

    In this example, the SUM function is used to calculate the aggregate value for each group. The CASE statement is used to check if the Value column is greater than 10. If it is, then the Value is included in the sum. Otherwise, 0 is included in the sum.

    You can modify this query to include only the selected columns by the user. You can also modify the CASE statement to check if the selected columns have a value greater than 10.

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments