
As most companies are continually evolving and updating their processes regarding data storage and optimization, you may encounter the scenario where you may have the desire or be tasked to migrate data from an on-premise database to the Power BI service. In certain scenarios, it is advantageous to have data hosted in the Power BI service rather than on premise. I find that server costs/space, speed to access data, and ease of controlling access to data usually compel the conversation for utilizing the Power BI service for data storage.
In my experience with setting up dataflows, I found that the following tips helped immensely:
1. Setup a dedicated gateway for dataflows
If there are refreshes and reports utilizing a single gateway, adding dataflows can possibly overwhelm the gateway leading to decreased performance and/or difficulty setting up the dataflow. If there is a dedicated gateway setup just for the dataflow process, it would allow the other processes to operate as normal, while ensuring that the dataflows are unaffected.
2. Use views wherever possible in lieu of Power Query
Unless you are familiar with Power Query (M), the translations between TSQL and M are not the most intuitive. Simple (and more importantly optimized) functions that happen in T-SQL can take many steps to replicate in M leading to additional code to process, added administration, and possibly decreased performance in the dataflow. What I found was exposing the dataflow to a view (if available) keeps level of complexity down, any business logic in an easy-to-modify state, and performance up for the dataflow
3. Ensure a datetime field is setup within the view to facilitate incremental refresh
In any dataflow setup, I would highly recommend setting up a datetime field within the object that is used for the base of the dataflow. The datetime field would need to capture the frequency that you need your incremental refresh to happen (daily, monthly, or yearly). The field does not have to contain the time unless your scenario involves incremental refreshes that are happening more than once a day. The reason why I make the recommendation is that it is fairly simple to implement incremental refresh once the dataflow is setup and if the dataflow does not need the refresh, it is simple to enable it if the need arises in the future. I highly recommend adding the field if you are unable to readily change the database object. Better to be safe than sorry! Additionally, I would setup another datetime field that is tied to when the table is updated and/or audited as a datetime field in case you need the incremental refresh to detect data changes.
Successfully implementing dataflows can greatly assist with keeping valuable on-premise server space available while providing an optimized way of storing bulk data and keeping the data up to date in an efficient and tightly controlled environment. I hope that the tips help you along the way to rolling out a successful implementation.