dataflow1 has the following:
source1 --> aggregate --> sink1
source1 --> dsDatacompanies
sink1 --> dsDatacompanies
Note that source reads a .csv
aggregate then gets the distinct rows
sink1 then writes to the same file as source.
Is this ok or should the sink file be different to that of source?