Whilst working on developing some reports for our Service desk l was asked if we could just return a dataset containing the last 6 and 12 months of data. This is the type of request l would normally use a TSQL script to resolve. In this case one of the purposes of this project was to allow us to dog food both PowerBI and also self service BI more generally. The challenge was quite simple find a way to filter data in a dataset based on specific datetime. to this with a specific date-time using the UI is very simple. In Figure 1 below l will filter the column titled Issue_Created, which contains date-time values. Figure 1 - The field 'Issue_created' selected showing the filter icon From the list l can use this to filter by certain dates as shown below in Figure 2. This lead me to wonder if it was possible to filter by date ranges.
l Figure 2 - Filter by specific date
Below is a video showing how to filter records that either equal or occur after specific date, see Figure 3. Figure 3 - Creating a filter to show record on or after a specific date.
Once l had set up the filter l then opened the 'Advanced Editor' to look at the M Code. The code that does the filtering looks like this
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Created] >= #datetime(2015, 7, 7, 5, 16, 52))
The syntax looked relatively simple and straightforward, one of the main challenges is that in the snippet above the datetime is static not dynamic. At this point there was a tweak to the specification, my internal clients l have every faith in me :-). What they required was based on today's date, get the first day of the month and then return first day of the month 6 months and 12 month before. This is a pattern which l have done many times in TSQL, this was going to be a bit more of an adventure.
Long story short, after some experimenting l arrived at the following formulas = Date.AddMonths(Date.StartOfMonth(DateTime.LocalNow()),-6) First day of the month 6 months ago based on current date = Date.AddMonths(Date.StartOfMonth(DateTime.LocalNow()),-12) First day of the month 12 months ago based on current date. Whats the next step? Create a variable which will contain the DAX formulas that have created above. The first step to open PowerBI desktop and go to 'Edit Queries', this will open the 'Query Editor' window. In the 'Query Editor' window click on the 'New Source' button and select 'Blank Query' from the pull down list see Figure 4.
Figure 4 - Creating a new Blank Query to act as a parameter Now the blank query has been created the next step is to name the parameter, the name that l use in the example is Previous_FDM_6Months. Next l type in the DAX formula -> = Date.AddMonths(Date.StartOfMonth(DateTime.LocalNow()),-6). Once thats done, then click the green tick to update and evaluate the parameter.
Figure 5 - Setting the parameter name and inserting the DAX formula So now l have parameter which returns a datetime value based on current date and time. The last step was to amend the part of the M code l had grabbed earlier to filter the dataset. Below is the line from the M code l use in production let Source = Sql.Database("A_Cloud_Database","Some_Data"), dbo_Some_Data = Source{[Schema="dbo",Item="SomeRandomData"]}[Data], #"Filtered_Rows_Last12_Months" = Table.SelectRows(dbo_Some_Data, each [Issue_Created] >= Previous_FDM_12Months), ......... The data is extracted from the database and filter only return data where the Issue_Created is equal to or greater than the datetime value returned by the parameter Previous_FDM_12Months. A complete reference of all the datetime functions in power query lanague aka M can be found here https://msdn.microsoft.com/en-us/library/mt296608.aspx. If you want to download a PowerBI desktop file with the functions you can download it here.
In the previous post (see here) l set up the blank datasets as parameters storing the values of the database server and name. Whilst attempting to create a suitable dataset to a report for an internal customer, l was frustrated that l could not do it using Power Query. Yet on the other hand l know that l could write some TSQL that would give me the dataset l wanted very easily. So l wondered if this was possible to combine the two? So lets return to the SQL.Database function which has this very nice example Sql.Database("localhost", "Northwind", [Query= select * from Customers]) The last setting is "Query as text - Provide a query text that is run on the server to return values". It's almost like someone thought it would be a good idea to allow a user to run a SQL query. A few minutes later l had the query l wanted to run and it looked something like this -:
DECLARE @IssueType AS VARCHAR(10) = 'Problem'; DECLARE @ClientName AS NVARCHAR(100) = 'RainbowPainters'; SELECT DATEPART(MONTH, [Issue_Created]) AS Month_Number ,DATENAME(MONTH, [Issue_Created]) AS Month_Name ,DATENAME(YEAR, [Issue_Created]) AS Year_Name ,COUNT(DATEPART(MONTH, [Issue_Created])) AS Total FROM [dbo].[IssueData] AS ID WHERE [Issue_type_name] = @IssueType AND [ClientName] = @ClientName GROUP BY DATEPART(MONTH, [Issue_Created]) ,DATENAME(MONTH, [Issue_Created]) ,DATENAME(YEAR, [Issue_Created]); Next when back to the dataset opened the 'Advanced Editor' and added the query, you can it in Figure 1 below Figure 1 - Advanced Editor window with the new query in place
To make it a bit easier to read below is the actual text from the 'Advanced Editor' window above. All that has been added is the text in red,and the query text which is in italics
letSource = Sql.Database(DatabaseServer, DatabaseName ,[Query = "DECLARE @IssueType AS VARCHAR(10) = 'Problem'; DECLARE @ClientName AS NVARCHAR(100) = 'RainbowPainters'; SELECT DATEPART(MONTH, [Issue_Created]) AS Month_Number ,DATENAME(MONTH, [Issue_Created]) AS Month_Name ,DATENAME(YEAR, [Issue_Created]) AS Year_Name ,COUNT(DATEPART(MONTH, [Issue_Created])) AS Total FROM [dbo].[IssueData] AS ID WHERE [Issue_type_name] = @IssueType AND [ClientName] = @ClientName GROUP BY DATEPART(MONTH, [Issue_Created]) ,DATENAME(MONTH, [Issue_Created]) ,DATENAME(YEAR, [Issue_Created]);" ]) in Source
When l refreshed the dataset, the data appeared :-) Job done? No, as often happens one client gets something and then everybody wants why they have. So l was asked if we could create the same set of reports for another customer. There was two choices at this point, l could just change the hard code client name, or could l parameterise the client name. Having heard this before l knew about passing parameters from excel to filter datasets see the following posts. https://blog.oraylis.de/2013/05/using-dynamic-parameter-values-in-power-query-queries/ http://blog.crossjoin.co.uk/2015/11/24/using-parameter-tables-to-control-data-refresh-in-power-bi/ Having looked at the blog posts which declare and then used parameters this looked like a simple concept. Kudos to the Power BI team for that. First step was to add additional dataset as a parameter. So l now had three dataset as parameters, the latest one being 'ClientName', which stores the client name
Figure 2 - List of datasets containing parameters So having looked at the blog posts above l realised that l just need to declare a query which would hold the query text and pass in the Client Name parameter to the query text. Following some trial and error, l ended up with the query text below let pQuery = "DECLARE @IssueType AS VARCHAR(10) = 'Problem'; DECLARE @ClientName AS NVARCHAR(100) = '" & ClientName & "'; SELECT DATEPART(MONTH, [Issue_Created]) AS Month_Number ,DATENAME(MONTH, [Issue_Created]) AS Month_Name ,DATENAME(YEAR, [Issue_Created]) AS Year_Name ,COUNT(DATEPART(MONTH, [Issue_Created])) AS Total FROM [dbo].[IssueData] AS ID WHERE [Issue_type_name] = @IssueType AND [ClientName] = @ClientName GROUP BY DATEPART(MONTH, [Issue_Created]) ,DATENAME(MONTH, [Issue_Created]) ,DATENAME(YEAR, [Issue_Created]);", Source = Sql.Database(DatabaseServer, DatabaseName,[Query=pQuery]) in Source
Figure 3 - showing the query text in the Advanced Editor Window Much to my delight this syntax works as expected. So l can now have a way that it is possible to specific parameters which can be passed to SQL server. From reading about M (Power Query Formula Language) there's more to learn. For the moment l am happy with the new found knowledge, whilst l play with some of the other technologies offered by PowerBI. The only downside is this only works in PowerBI Desktop :-(
One of the attractions of being a programmer was that l could find ways to make routine tasks easier and quicker. As our company and clients have discovered the joy of Power BI also the reports l have created. There are more clients are asking for reports. Some of these are the same reports to ensure everyone has the same view of their data. Scenario There are two databases l populate with data, one of these is on my machine, the other is an Azure database. As you would expect l use the local copy to test out various changes to the database. Then using the Redgate tool SQL Compare l can script any changes and apply them to the Azure database. It is possible to change the data source for each dataset easily using the following method. From the main window, on the 'Home' tab click once on the 'Edit Queries' button see Figure 1 Figure 1 - showing the location of the 'Edit Queries' button
This will open the 'Edit Queries' window now to see where the data source for a selected dataset. First select a dataset in the example shown in Figure 2, it is 'DimDate'. Next click on the 'Advanced Editor' button see figure 2.
Figure 2 - Showing location of 'Advanced Editor' button When the Advanced Editor window opens it will look something like the one shown in Figure 3 below. Figure 3 - Advanced Editor window, with M (Power Query Formula Language) query showing the Database Server Name and Database Name The syntax of the statement is relatively simple. In this case we are looking at the Sql.Database data function, the syntax is simple and looks like this -: Source = Sql.Database( "Name of Database Server", "Name Of Database") This is the text shown in Figure 3 let Source = Sql.Database("localhost", "DBdata"), dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data] in dbo_DimDate
To change the data source to point at the Azure database the only changes that have to made are as follows
let Source = Sql.Database("AzureDatabase", "DBdata"), dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data] in dbo_DimDate Job done. This was nice until you have to do it several times, and personally I got bored then wondered if there was a better way? I am glad you asked if there is a better way let me show you. Having seen some examples where the data source was an excel spreadsheet l was aware that it was possible to supply parameters to change the data source. From one spreadsheet to another one. The first step is to set up the parameters. Open the Query Editor window, click on the 'New Sources' button from the list click on 'Blank Query' (see Figure 4)
Having set up the DatabaseServer dataset (parameter value), repeat the process and create a DatabaseName dataset (parameter value). Figure 6 shows what you query editor window would look like if you entered the same details shown in this example.
* DatabaseServer - containing the value 'AzureDatabase' or the name of the database server you are going to connect to. * DatabaseName - containing the value 'DBData' or the name of the database you are going to connect to.
If we look at the advanced properties of the dataset 'DimDate' the data source would look like the one shown below. let Source = Sql.Database("AzureDatabase", "DBdata"), dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data] in dbo_DimDate Next step is to replace the database server name and database name with the names of the Datasets that have just been set up. So the data source will look like the one below. let Source = Sql.Database(DatabaseServer, DatabaseName), dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data] in dbo_DimDate
The advanced query window should look like the one show in figure 7 below. When the data source is queried the values of 'DatabaseServer' and 'DatabaseName' are replaced by the values from the relevant dataset. In this PowerBI desktop report l just replaced all the hard coded instances of the database server name and database name with the relevant parameter. Which allows me to switch between two databases with the ease l wanted, only with PowerBI desktop :-( Figure 7 - Showing the final M (Power Query Formula Language) text (data source) If you click on this link you can download a copy of a .PBIX with the two dataset's shown above. There is more you can do with this you will need to wait till part two of this blog post for more details.
So first l created a variable in the query window see figure 1.
Now you have a card with dynamic title, if you want to change the client name all you have to do is change the text in the variable. A sample PowerBI report file can be downloaded here which contains the report shown above.
For various reasons which l have now forgotten, l set up transactional replication for some clients. The result of this is l am the caretaker of transactional replication for two of our clients, what l lucky person l am ! T-SQL code used to check Transaction replication Once l got the process up and running (a very, very long and stressful story). At that point l realised that I would have to monitor these processes. Following some goggling this article was found with some TSQL written by SQLSoldier here. This worked for me and l used this to monitor replication by running the script and checking the results manually. Nagios output Our SysAdmin uses a tool called nagios to monitor out IT estate. So they suggested that a script could be written to monitor the replication process and send an alert if anything needed to be looked at. This sounded like an excellent idea, how to do it? The approach that was arrived at, involved using a PowerShell script which would run a SQL query examine the results then respond to Nagios, with the following values
Next we decided to use PowerShell to return the results. Following some goggling we found this page http://www.madeiradata.com/cross-server-replication-health-check-using-powershell/ which runs a SQL script and returned a data set. First challenge was the TSQL script from SQLSoldier was rather long for my first powershell script, l wanted some thing smaller. So l created a stored procedure based on the script, and placed it in my Distribution database on the replicated database server. Doing this had a two reasons, first less TSQL in the PowerShell script, second changing one of the parameters meant it returns different amounts of data. The stored procedure takes the following parameters ;
When considering this error message it was primarily to give some guidance as to what might be happening with the replication process. It is not intended to give any guidance on the underlying reason that is causing the issue. All that is required is that the Nagios process shows that there is something wrong. What ever the reason it requires some form of human intervention. Once an error condition has been detected then the issue will be handed to me to resolve. At least now l do not have to check the process periodically, now l just have to wait for a message from our sysadmin.
Figure 3 - Second data source settings window Now you can get the details eg Username and password used to connect to the data source are stored. If these are required. Figure 4 - Data source credentials window So we now have the follow credentials Username = JoeBloggs@rainbows.com Password = JoeB123 N.B. We will be using these details later. Tip if using multiple accounts with PowerBI online Next step would be to publish your reports to the PowerBI service at PowerBI.com. If you have multiple accounts which you use to publish your reports to a PowerBI service. Then one tip is log out of the account and log back in as there is nothing in the PowerBI desktop application which indicates which account is logged in. Adding credentials in PowerBI Service (Aka PowerBi.com) One quick tip here is to open PowerBi.com using Internet Explorer. At the time of writing process shown next did not reliability render in other browsers. Log into PowerBI.com once successfully log in, go to the top right hand side of the window. Click with left hand mouse button the settings icon, from the menu click on the 'Settings' menu item.
Once the 'Settings' window opens click on the 'Datasets' tab Once in 'Settings->Datasets' click once with left hand mouse button on the triangle to the left of 'Data source credentials' Once the 'Data source credentials' have been expanded click on the 'Edit credential' link. Once the credentials configuration window opens the drop down box below the authentication method select the 'Basic' authentication method. When you select the 'Basic' Authentication method, text boxes will appear at the bottom of the form. Enter the username and password, then click on the 'Sign In' button. If the user name and password has been successful then you will see a message appear in the top right hand corner of the PowerBi window, indicating the data source has been successfully updated.
The core application my current company has built allows people to apply for grants. We track the application through various stages, much like a loan application there are many stages that the application might go through. The status of the application is not linear. So the applications can go back and forth between the same status more than once. In figure 1 below an application .can given a status of ‘under review’ then get a status of ‘submitted’ or ‘Pre-Submission’. So, an application might have the same status more than once with different dates.
Figure 1 –showing the workflow an application might take. One question which our customers want to know is how long an application takes to move from one status to another. This question could be answered using the existing application database. Using that method is not ideal as the queries are quite complex and slow. This is one of the reasons I decided to build a Datawarehouse as a proof of concept. To see if there was a quicker way to answer that question.
Calculating time difference Given that an application can be allocated a status more than once during its lifecycle. This has an impact on the method we used for calculating the time difference. The methodology used to get the time difference is defined below
The fact table in the POC Datawarehouse, stores the ApplicationID, Status, and the datetime the application was allocated that status. If an application has been assigned the same status then there would be a record for each time the application has received that status. This has some advantages; the end user can easily understand the data as it is shown in the table. On the other hand, this presented some difficulties with calculating the time period between each status. After various attempts the following calculated column was created. In this case calculating the days between a grant receiving Pre-Submitted status and Rejected status.
Figure 2 – First version of the calculated column DaysFromSubmittedToRejected
For both values the logic used is very similar, for the variable FirstSubmittedDate the MIN() function is used to get the first date. The FILTER statement only returns records with same ApplicationID and Status = ‘Submitted’. Whereas the variable LastRejectedDate the MAX() function is used to return the last date. Using variables, means the logic can be re-used for other statuses, and the DATEDIFF() function is easier to read. When the results are displayed they were not as expected.
Figure 3 – results from the first version of the DaysFromSubmittedToRejected function see Figure 2
The requirement was to return the number of Days elapsed From Submitted To Rejected with the value returned against the record which is the last time the application received that status. So, another function was created to see if the current record is the LastRejectedDate, which is shown in Figure 4.
With this calculated column (LastRejectedDate) in place the return statement of ‘DaysFromSubmittedToRejected’ function just requires a small change which is as follows
DaysFromSubmittedToRejected = VAR FirstSubmittedDate = CALCULATE ( MIN ( 'FactAppStatuses'[DateAddedAt] ), FILTER ( ALL ( 'FactAppStatuses' ), 'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] ) && 'FactAppStatuses'[Status] = ‘Submitted’ ) ) VAR LastRejectedDate = CALCULATE ( MAX ( 'FactAppStatuses'[DateAddedAt] ), FILTER ( ALL ( 'FactAppStatuses' ), 'FactAppStatuses'[ApplicationID] = EARLIER ( 'FactAppStatuses'[ApplicationID] ) && 'FactAppStatuses'[Status] = 'Rejected’ ) ) RETURN IF ( 'FactGrantStatuses'[LastRejectedDate] = "Y", DATEDIFF ( FirstSubmittedDate, LastRejectedDate, DAY ), 0 )
Figure 5 – Second version of the calculated column DaysFromSubmittedToRejected
When the function is added to the PowerBI report shown in Figure 5 then run then the results can be see in Figure 6 below.
Figure 6 – results from the second version of the DaysFromSubmittedToRejected function (see Figure 5) The addition of the IF() statement uses the value to check if current row contains the LastRejectedDate. Otherwise it returns a zero. This returns the record set which is expected. The positive uses of this method are that the same calculation can be re-used for a variety of statuses only requiring a minor change to the formula. On the negative side the more calculated columns there are the longer it would take to generate a report. No doubt there are other approaches that can be taken to improve this. The time taken to evaluate the calculated column will be slower as the number of rows increases. On the other hand, the current method is significantly faster than doing the same calculations on the application database. Even better with the use of a PowerBI report or a tabular data model. These figures could easily be used by an end user to reports which provide insight into their data. All the better that our clients could do this for themselves, with relative ease.