question

RyanAbbey-0701 avatar image
0 Votes"
RyanAbbey-0701 asked RyanAbbey-0701 commented

input_file_name

I am trying to read multiple parquet files and want to add the source file name to the dataframe using Synapse 2.4 cluster, however when adding the column using "input_file_name", the column is empty
spark.read.parquet(*sfile).withColumn("input_file_name", F.input_file_name())

Any known issues with this? Any alternative ways to get the filename added (short of a union loop)?

azure-synapse-analytics
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @RyanAbbey-0701,
Thanks for the ask and using Microsoft Q&A platform .
Can you please share more detail as to what is "F" here ? May be sharing a more of the code ( if possible ) will help .
Thanks
Himanshu

0 Votes 0 ·
RyanAbbey-0701 avatar image RyanAbbey-0701 HimanshuSinha-MSFT ·

pyspark sql functions
from pyspark.sql import functions as F

sfile is a list of files...

I've found if just one file is passed in, then "input_file_name" works but when more than one file, it is null

0 Votes 0 ·

0 Answers