Database table partitioning – Build filegroup and partitions

Step 1

CREATE PARTITION FUNCTION PF_Fact_test (int)

AS RANGE

LEFT FOR VALUES

(20140101,20150101,20160101);

]

Step 2 

CREATE PARTITION SCHEME [PS_iGPT_FACT] AS PARTITION [PF_iGPT_FACT] TO ([iGPT2014], [iGPT2015], [iGPT2016],[iGPT2017])

Step 3

 Assign Partition Scheme and Partition Funtion to a table

USE [iGPT]

GO

BEGIN TRANSACTION

CREATE CLUSTERED INDEX [ClusteredIndex_on_PS_Fact_Test_636351103815297616] ON [dbo].[iGPTextraInfo]

([MaxDateAces])WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PS_Fact_Test]([MaxDateAces])

DROP INDEX [ClusteredIndex_on_PS_Fact_Test_636351103815297616] ON [dbo].[iGPTextraInfo]

COMMIT TRANSACTION

Step 4

Load data into the partitioned table

 –The following query returns one or more rows if the table PartitionTable is partitioned

SELECT *

FROM sys.tables AS t

JOIN sys.indexes AS i

ON t.[object_id] = i.[object_id]

AND i.[type] IN (0,1)

JOIN sys.partition_schemes ps

ON i.data_space_id = ps.data_space_id

WHERE t.name = ‘GPTClaimPendedFact’;

 

 

— The following query returns the boundary values for each partition in the PartitionTable table.

SELECT t.name AS TableName, i.name AS IndexName, p.partition_number,p.rows AS Rows, p.partition_id, i.data_space_id, f.function_id, f.type_desc, r.boundary_id, r.value AS BoundaryValue

FROM sys.tables AS t

JOIN sys.indexes AS i

ON t.object_id = i.object_id

JOIN sys.partitions AS p

ON i.object_id = p.object_id AND i.index_id = p.index_id

JOIN sys.partition_schemes AS s

ON i.data_space_id = s.data_space_id

JOIN sys.partition_functions AS f

ON s.function_id = f.function_id

LEFT JOIN sys.partition_range_values AS r

ON f.function_id = r.function_id and r.boundary_id = p.partition_number

WHERE t.name = ‘iGPTInquiryFact_WGS’ AND i.type <= 1

ORDER BY p.partition_number;

 

 

–The following query returns the name of the partitioning column for table. PartitionTable.

SELECT

t.[object_id] AS ObjectID

, t.name AS TableName

, ic.column_id AS PartitioningColumnID

, c.name AS PartitioningColumnName

FROM sys.tables AS t

JOIN sys.indexes AS i                           ON t.[object_id] = i.[object_id]  AND i.[type] <= 1 — clustered index or a heap

JOIN sys.partition_schemes AS ps  ON ps.data_space_id = i.data_space_id

JOIN sys.index_columns AS ic             ON ic.[object_id] = i.[object_id]  AND ic.index_id = i.index_id   AND ic.partition_ordinal >= 1 — because 0 = non-partitioning column

JOIN sys.columns AS c                           ON t.[object_id] = c.[object_id]   AND ic.column_id = c.column_id

WHERE t.name = ‘GPTClaimPendedFact’ ;

 

 

 

SELECT row_count              ,*

FROM sys.dm_db_partition_stats

WHERE object_id = OBJECT_ID(‘dbo.GPTClaimPendedFact’) –and row_count > 0

order by partition_number

SELECT * FROM sys.dm_db_partition_stats;

SELECT SUM(used_page_count) AS total_number_of_used_pages,

SUM (row_count) AS total_number_of_rows

FROM sys.dm_db_partition_stats

WHERE object_id=OBJECT_ID(‘dbo.GPTClaimPendedFact’)    AND (index_id=0 or index_id=1);

New DAX Functions only in SSAS 2016 and Above

SQL Server 2016 Analysis Services (SSAS)*, Power Pivot in Excel 2016, and Power BI Desktop include the following new Data Analysis Expressions (DAX) functions:

Date and Time Functions

CALENDAR Function
CALENDARAUTO Function
DATEDIFF Function

Information Functions

ISEMPTY Function
ISONORAFTER Function

Filter Functions

ADDMISSINGITEMS Function
SUBSTITUTEWITHINDEX Function

Math and Trig Functions

ACOS Function
ACOSH Function
ASIN Function
ASINH Function
ATAN Function
ATANH Function
COMBIN Function
COMBINA Function
COS Function
COSH Function
DEGREES Function
EVEN Function
EXP Function
GCD Function
ISO.CEILING Function
LCM Function
MROUND Function
ODD Function
PI Function
PRODUCT Function
PRODUCTX Function
QUOTIENT Function
RADIANS Function
SIN Function
SINH Function
SQRTPI Function
TAN Function
TANH Function

Statistical Functions

BETA.DIST Function
BETA.INV Function
CHISQ.INV Function
CHISQ.INV.RT Function
CONFIDENCE.NORM Function
CONFIDENCE.T Function
EXPON.DIST Function
GEOMEAN Function
GEOMEANX Function
MEDIAN Function
MEDIANX Function
PERCENTILE.EXC Function
PERCENTILE.INC Function
PERCENTILEX.EXC Function
PERCENTILEX.INC Function
SELECTCOLUMNS Function
XIRR Function
XNPV Function

Text Functions

CONCATENATEX Function

Other Functions

GROUPBY Function
INTERSECT Function
NATURALINNERJOIN Function
NATURALLEFTOUTERJOIN Function
SUMMARIZECOLUMNS Function
UNION Function
VAR

How to create customized color themes for your PowerBI visuals

I found it is just as important to have a nice looking dashboard/reports as the data itself. In this blog, I want to share with you some good tricks which I learned recetly from PowerBI.com and a Youtube channel EnterpriseDNA to create customized color themes in PowerBI desktop.

By default, the customized theme feature is not enabled in PowerBI desktop. So what you need to do is go to ‘File’ -> ‘Options and Settings’

PowerBI Option Change

then, enable the ‘Custom Report Themes’ in ‘Preview Features’ section.

Custom Report Theme Enable

After restarting your PowerBI desktop, you will see ‘Themes’ under ‘Home’ tab

Themes tab

By choosing ‘Switch Themes’, you can import your customized theme saved as JSON file.

Typical JSON Code for themes like this

{    “name”: “St Patricks Day”,
“dataColors”: [“#568410”, “#3A6108”, “#70A322”, “#915203”, “#D79A12”, “#bb7711”, “#114400”, “#aacc66”],
“background”:”#FFFFFF”,
“foreground”: “#3A6108”,
“tableAccent”: “#568410” }
From <https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-report-themes/>

But, how could we get a good color combination which looks good if I am not a artist?

Usually, I will create the foundation colors based on company logo or image. Since Halloween is coming soon, let’s use a Halloween picture as an example.

There is a site recommended http://palettefx.com/ which helps you to find all the colors used in the image.

Color Pickers

After that, you use Notepad and edit the JSON code I shared above and save as .json file. Json code

Now, you can import the JSON file into PowerBI use the ‘Switch Themes’ add-in we got as your customized theme.

Jason file

Now, you can use your new customized theme to build your PowerBI reports!

There is a PowerBI theme Gallery: https://community.powerbi.com/t5/Themes-Gallery/bd-p/ThemesGalleryHopefully it helps.

Your friend, Annie

 

From SQL to DAX- ‘Lead’ and ‘Lag’ window functions

In SQL, we have two window functions call lead and lag, and with these two functions, you can get the previous and next value of a column partition by and order by other columns in a table.

Use our Advanturework Sales.SalesOrderhead table as an example. The following code can give you the previous and next SalesOrderID for a SalesPersion order by OrderDate.Lead and Lag SQL code

However, it is a very expensive function because the SQL engine need to fetch through the entire table for every row calculation where the functions are called.

It is much faster to use DAX in SSAS tabular model in this case, where the column-store and vertipaq compression technologies are embedded. To use DAX replace lead and lag function, we will be using a key function in DAX called ‘Earlier’.

Using the same example mentioned above,

You can write Previous Order ID calculate column like this:DAX previous and next.PNG

As we know, ‘Calculate’ Function covers the current row context to filter context of the calculation (as the first argument) inside ‘Calculate’. However, the filter contexts (second and the following arguments of ‘Calculate’) created in side ‘Calculate’ block those external filter contexts when they are referring the same columns. In this case, ‘Filter’ function blocked all the filter contexts added externally on the ‘SalesOrderHeader’ table, in other word, the calculation in the first argument of ‘Calculate’ MAX(SalesOrderHeader[SalesOrderID]) don’t know which row it is at SalesOrderHeader table. Only the ‘Earlier’ function brings the previous filter contexts back, which allows MAX(SalesOrderHeader[SalesOrderID]) aware of which row it is at.

Using the second row of the above screenshot as an example, the DAX calculation of PreviousOrderID column can be explained as:

Find the max SalesOrderID where SalesPersonID equal to the SalesPersionID of existing row (called by ‘Earlier’ function) which is ‘274’ and OrderDate are older than the OrderDate of existing row (called by ‘Earlier’ function) which are all the records with OrderDate older than 9/1/2015. Thus, the result is ‘43846’.

 

Thanks.

Your friend Annie.

 

 

 

How to use Dynamic Management Views against on your Desktop PowerBI reports

Power BI contains a local instance of Analysis Services tabular model. By querying Dynamic Management Views (DMVs) query against PowerBI desktop, we can get metadata information about your PowerBI model.

Here are the steps to do so:

Step 1: Open your Power BI report

Step 2: Find your Power BI Analysis Model Instance Port ID.

There are two ways to do that:

Option 1: Open up DAXStudio (a great free tool to help you develop DAX). And then connect to the Power BI report you opened.

Get DAX 1.PNG

Then, Find the local Analysis Service instance address of this Power BI report on the right bottom of the DAX studio window

Get DAX 2.PNG

Option 2: Fine the Power BI temp directory
C:\Users\username\AppData\Local\Microsoft\Power BI Desktop SSRS\AnalysisServicesWorkspaces\…\Data

PowerBI Port

 

Step 3. Open SQL Server Management Studio. And connect to the local instance of Analysis

Get DAX 3

 

Step 4: Create an new query against the only Database under the local instance

Get DAX 4

Step 5: In the query window, you can then run the Dynamic Management Views (DMVs) to Monitor your PowerBI local instance.

The ones I use often including the following:

  • The DMV provide the DAX query behind the report:
Select * from $System.discover_sessions
  • This DMV Provide you all the fields in your model
Select  * from $system.MDSchema_hierarchies
  • This DMV Provide you all the measures in your model
Select * from $System.MDSCHEMA_MEASURES

Find relationships:

Select [ID], [ModelID], [IsActive], [Type], [CrossfilteringBehavior], [FromTableID], [FromColumnID], [FromCardinality], [ToTableID], [ToColumnID], [ToCardinality], [ModifiedTime]
from $SYSTEM.TMSCHEMA_RELATIONSHIPS
Select [ID], [ModelID], [Name]from $SYSTEM.TMSCHEMA_TABLES
Select [ID], [TableID], [ExplicitName] from $SYSTEM.TMSCHEMA_COLUMNS

For More DMV, there is a good post: https://datasavvy.me/2016/10/04/documenting-your-tabular-or-power-bi-model/

Bonus

If you do not have SSMS, you can use Power Query in Excel or Power BI to query those DMVs, here is an sample M query.

let
    Source = AnalysisServices.Database(TabularInstanceName, TabularDBName, [Query=”Select [ID], [TableID], [Name], [Description], [DataSourceID], [QueryDefinition], [Type], [Mode], ModifiedTime from $SYSTEM.TMSCHEMA_PARTITIONS”]),
    #”Renamed Columns” = Table.RenameColumns(Source,{{“ID”, “ID”}, {“DataSourceID”, “Data Source ID”}, {“QueryDefinition”, “Query Definition”}, {“ModifiedTime”, “Modified Time”}})
in
    #”Renamed Columns”

Thanks

Your friend, Annie

 

 

Compare Formula Engine VS. Storage Engine in SSAS Tabular

To improve SSAS tabular performance or to improve your DAX query, it is important to know the different between formula engine and storage engine. Here, I have created a table for you to better distinguish these two in Tabular (I need to reinforce it is for tabular modeling not multidimensional because the back end technology used is quite different). Both engines play vital roles to process DAX query requests.

FE and SE

Category Formula Engine Storage Engine
Query received Interpret DAX/MDX formula Can handle single logic (xmSQL) from formula Engine
Target Data Iterate over datacaches produced by storage engine (datacaches are in-memory tables) Iterate over compressed data in vertipaq column stores
Result Produce result set and send back to requestor Produce Datacaches send back to formula Engine
Thread Single – Threaded Multi – Threaded
Cache utilization No Yes
Area of focus for  Performance Tuning Check physical plan for bottleneck Check xmSQL query for bottleneck

 

Different roles in a typical BI project

Today, I would like to talk about the different roles in a typical BI project. Like all technology-related initiatives, it is vital to assign clear responsibilities and distinguish different responsibilities in order to achieve a successful BI project result.

 

  • Project Manager – This role is needed to understand the overall scope of the BI project including the objectives, timeline, budget, and resources. Based on these parameters, they will need to work with other individuals within the project team to come up with timelines and resource assignments. During the development phase, he/she will monitor the project’s progress and make adjustments as needed.
  • BI Solution Architect – This person will be in charge of the overall design of the data warehouse. They will focus on creating the vision of the entire BI process, development and lifecycle. They will work directly with business users and the project manager to design actionable insights while concurrently working with the ETL developer, DBAs, and Data Modelers to break down the vision into detailed actionable tasks.
  • Data modelers – The Data Modelers will be responsible with the design of the data warehouse schema and the data models. They need to understand the business requirements, the vision produced by the BI solutions architect, and the physical capabilities of the environment. In addition, the will be responsible for creating the data warehouse schema, the ETL requirements, the data models, and the design model processing schedules.’
  • Database Administrators (DBA) – These individuals will be in charge of designing the physical Database Architecture and configuration, security settings, as well as data source identification and data source management.
  • Infrastructure Specialists – These individuals are responsible for ensuring that the servers and networks are set up correctly to meet the current and future requirements. They will also be responsible for the server upgrade and migration.
  • ETL Developers – The ETL Developers are responsible for building the workflows from external source to the Data Warehouse. They will be working closely with the DBAs and Data modelers to understand the source and target form of the data as well as the timing of ETL workflows.
  • Report developers – These individuals, of all the other roles, work closest with the business users. Their role is to work with business users to design the reports and schedule automated report generation while providing the data modelers with future model requirements.
  • Business users – These individuals provide requirements to the BI development team. They provide initial abstract pertaining to what they want at the beginning of the project. As prototype products come out, these abstracts become more detailed and final.
  • Testers – Testers are responsible for Unit Testing, Integration Testing, Functional Testing, and Systems Testing. Performance Testing, Usability Testing and Acceptance Testing.

(https://www.codeproject.com/Tips/351122/What-is-software-testing-What-are-the-different-ty)

 

Often time, all the project team members work closely with each other. Usually, for a large project, one or more team members will wear multiple hats pertaining the roles above. In smaller BI teams, one person may play different roles. Although this may create some extra burden in terms of workload, it reduces the onerous need for establishing communication between different persons. I usually see the BI solution architect, data modeler and ETL developer to be one person at the beginning of the development phase and as the project grows, that person will begin outsourcing his/her roles to other team members.

 

Thanks,

Your friend – Annie

DAX – Filter Context V.S Row Context

A key to understanding DAX is to distinguish Filter Context and Row Context.

Here are definitions of each context.

Filter Context: can be simply defines as filter context the set of filters applied to the evaluation of a DAX expression, usually a measure, regardless of how they have been generated. Usually, every cell of a report has a different filter context, which can be defined implicitly by the user interface (such as the pivot table in Excel), or explicitly by some DAX expression using CALCULATE or CALCULATETABLE. Filter Context always exist before a DAX expression been evaluated.

 

Row Context: the concept of row context is always exist in a DAX expression. In order to get the value of a DAX expression, you need a way to tell DAX the row to use AKA checking the current row of a table and provide the value back. You have a row context whenever you iterate a table, either explicitly (using an iterator) or implicitly (in a calculated column).

 

Use one DAX measure as an Example:

SumofProductSalesAmount:=SUMX(

VALUES([Product].[ProductName]),   

CALCULATE (

        [Sales Amount]

    )

)

 

Explain in detail of this calculation:

  1. VALUES Function returns a table, meaning it provide All the Distinctive ProductName back as a table outcome
  2. SUMX iterate each row of the table in its first argument, then it aggregate the value each row returns in sum. In this case, for each row of a table which provided by VALUES function AKA for each product, it calculate the value in the second argument of the SUMX function. Then, Sum all the value together and return back to the caller.
  3. Things happens outside CALCULATE Function are all treated as Filter Context of the CALCULATE expression. In this case, there are two places feeding the CACULATE section with Filter Context.
    • First place is, the Current Row defined by SUMX, in our case, it is the Current Row of VALUES([Product].[ProductName]) which will be a specific Product depends on where the iterator at. Meaning Evaluated [Sales Amount] Measure with a filter which limit to a certain Product, and then provide the value (sales amount) back.
    • Second Place is, whenever this measure [SumofProductSalesAmount] exposed in a cell user application (like Excel, PowerBI, SSRS), it exposed in a filter context set on that specific Cell. Depends on wherever your cell is at you have different filter context. For example you may have a filter on a cell to limit the time frame to 2017, etc.

 

Thanks,

Your friend – Annie.

Situations to use Multidimensional SSAS model over Tabular Model

4154_tabular_vs_multidimensional

I was introduced with SSAS modeling about 5 years ago. At that time, our team is recommended by Microsoft to use their new modeling technology called Tabular SSAS modeling. Since then, our team started to build multiple tabular models, and those models are growing larger and larger with more complicated calculations. We love tabular modeling because it uses xVelocity engines (Vertipaq – in memory analytic engine and memory optimized ColumnStore index) which means it is super fast because of the in memory storage and it is mulch easier to implement because it uses relational modeling structure which we are familiar with, the script language DAX is kind of like Excel formula to start, and we don’t need to consider set, MDX calculations, aggregations, and storage modes etc. which makes Multidimensional Modeling very complex and difficult to learn.

However, with more complicated requirements come in, we are facing some challenges with current Tabular modeling solution.

As of recently, I started to prepare Microsoft BI certification and started to look into Multidimensional modeling. I found out that we may be able to leverage some of its capabilities which already exists in multidimensional modeling which has been exist so many years. I listed some of those capabilities which are not exist in Tabular model yet.

  1. Remote Partitioning (partition can locate in other server)
  2. Multiple Storage mode selection (MOLAP to ROLAP)
  3. Write Back option – customer can write back to the model – only for aggregate function other than Sum (create “what-if analysis”
  4. Aggregations – pre-calculated (can change setting of how much to pre aggregated) need to consider the storage and maintenance required
  5. Many to many relationships
  6. Role play dimensions
  7. Customized Drill-through and other actions (for older version of Tabular we can use BIDS helper to do it. But, for tabular in SQL server 2016 version, it is disabled)
  8. Merge Partitions, ssms will automatically remove but the visual studio workspace project need to update manually.
  9. Calculation Template
  10. Build in Business Intelligent
  11. Process Index (build or rebuild)
  12. Partition Slice – know only pull the model slice information and provide fast finding of the right partition to use to calculate

Thanks,

Your Friend, Annie

 

Blog at WordPress.com.

Up ↑