dataflow1 has the following:
source1 --> aggregate --> sink1
source1 --> dsDatacompanies
sink1 --> dsDatacompanies
Note that source reads a .csv
aggregate then gets the distinct rows
sink1 then writes to the same file as source.
Is this ok or should the sink file be different to that of source?
Thank you