I’m trying to use sql query on azure-databricks with distinct sort and aliases
SELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistId
The problem is that if I add an alias then I can not use not aliased name in the order by clause.
ORDER BY album.ArtistId part produces an error. ORDER BY my_alias works.
If I remove distinct it also works.
Seems like after Project step original column name is lost. That behavior is unexpected for SQL compared to other SQL dialects. And I can not find any documentation about it.
Error in SQL statement: AnalysisException: cannot resolve '`album.ArtistId`' given input columns: [my_alias]; line 2 pos 22; 'Sort ['album.ArtistId ASC NULLS FIRST], true +- Distinct +- Project [ArtistId#2506 AS my_alias#2500] +- SubqueryAlias spark_catalog.chinook.album +- Relation[AlbumId#2504,Title#2505,ArtistId#2506] parquet
The where clause works fine
SELECT DISTINCT album.ArtistId AS my_alias FROM album WHERE album.ArtistId = 1
This looks like a bug.
Is there any way to make this query run as is?
The query is generated by sqlachemy and works on other databases fine, I prefer to not edit it for databricks.
The schema used for testing is Chinook database.