TODO //jeb next make this more matrix toolsy. maybe even a reason to use it and migrate our data to azure?
# Gotchas
1. Data flow Variables
* First drill into the Data flow block, general Parameters, and declare a variable.
* Then go back out to the Data flow block and click on it, now you should be able to set the declared Parameter(s).
2. Data Flow: Cloud VS On-Prem
* Pipelines -> Asn -> Refresh WMS Data -> Settings
* Cannot use "on-prem" pipelines in Data Flow. Because of this I needed to pull the data from WMS into our system, which takes extra time, additionally I am getting this message: You will be charged # of used DIUs * copy duration * $0.25/DIU-hour. Local currency and separate discounting may apply per subscription type.
* https://stackoverflow.com/questions/56640577/azure-data-factory-data-flow-task-cannot-take-on-prem-as-source
* https://feedback.azure.com/forums/270578-data-factory/suggestions/37472797-data-flows-add-support-to-access-on-premise-sql-a
2.5. Performance Issues with On-Prem Copy
* https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#copy-performance-and-scalability-achievable-using-adf
3. Partitioning
* I believe the actual Azure Data Factory is run in "Partitions", while the Debug / Preview mode always runs on 1 Partition, which can lead to unexpected results when running in the real world.
* I specifically had issues with sorting and numbering rows.
* It appears that my output data set was partitioned (grouped) on the first column (by default, I never set this up, nor did I ever see this in preview mode so it very was unexpected).
* Solution:
* To resolve this issue, what I came up with was to "force" the natural sort order (where in preview mode I did not need any sort).
* I was forced to come up with a way to sort the results back into their "natural order", luckily this "natural order" had a pattern I could take advantage of.
* When that also did NOT help in the actual Data Factory run, I was finally able to have success by using this "Single partition" option on my new sort task.
* Finally, ensure that any logic that depends on the sort order (like row numbering / filtering) takes place after your sort task.

4. The contains function requires a #item expression:
contains($FileNames, #item == filename)
5. Weird errors
1. "store is not defined"
* Apparently, using string[] variables doesn't seem to be supported, even though it is in the selection dropdown, or maybe it requires a weird format?
Finally find an article on it:
* "There seems to be parsing problems having multiple items in the variable each encapsuled in single-quotes, or potentially with the comma separating them. This often causes errors executing the data flow with messages like "store is not defined"
* "DanielPerlovsky-MSFT
While array types are not supported as data flow parameters, passing in a comma-separated string can work"
2. "shaded.databricks.org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: Bad Request"
* Can occur if you have checked "List of Files" on the source but did not provide a pattern.
@{activity('ASN Load Numbers').output.value[0].LoadNumber}
Use parameter like this:
'ASN_' + replace(toString(currentTimestamp()), ':', '') + '_' + $LoadNumber + '_TEST.txt'
# Resources
https://techcommunity.microsoft.com/t5/azure-data-factory/adf-data-flow-expressions-tips-and-tricks/ba-p/1157577
https://docs.microsoft.com/en-us/azure/data-factory/tutorial-data-flow
https://docs.microsoft.com/en-us/azure/data-factory/format-delimited-text
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-derived-column
https://marlonribunal.com/azure-data-factory-data-flow/
https://www.sqlservercentral.com/articles/azure-data-factory-your-first-data-pipeline
https://medium.com/@adilsonbna/using-azure-data-lake-to-copy-data-from-csv-file-to-a-sql-database-712c243db658
Permanent Link
If this article helped you, or you have any thoughts on how to do this better, please click the Like button and/or leave a comment below.