One of the critical tasks of today’s organizations is data quality and handling. The key to data quality is making sure that data is unique and filtered so that it will be easy to integrate with different systems with in the organization. To avoid duplicate within the data is one of he major tasks every organization is trying to hardwire within their critical systems. In this series of blog posts, I going to discuss few scenarios of handling duplicate data in Microsoft Dynamics 365.
We all know the traditional OOTB De-Duplication feature of Microsoft Dynamics 365 CRM, which is majorly used to detect duplicates. In few scenarios we even use “Unique Keys” to restrict duplicates. But these features either use “Exact” or ”Partial” match precisions. But this feature cannot handle typo’s or misspellings. Recently Microsoft has released the Fuzzy Matching feature as part of Power BI. For more details read FuzzyMatching PowerBI. Lets look at the steps to create a PowerBI report which shows up probable duplicates with in Dynamics 365 CRM system using Fuzzy Matching.
Open Power BI Desktop, create a new BI report. We are going to use the new CDS connector to connect to the Dynamics 365 CRM system. Select the Data Source as CDS connector.
Set the URL to the Dynamics 365 CRM instance.
Select the entity which requires to be filtered. Lets assume that we have duplicate data as part of Contact entity. As you can see the FirstName is misspelled.
Select the Contact Entity as shown below.
For our scenario, as the duplicate data is part of the contact names, lets select the fullname field along with contactid for filtering.
Once selected, the data will be shown in the view. The next step is to apply fuzzy matching on the data to filter out the duplicate data. Click the “Merge Queries” or “Merge Queries as New option”.
From the Merge Wizard select the parent and the reference entity as Contact. We are going to perform an left outer join on the results to find the duplicates. Select the fullname column to create the merged table.
As we are going to use the Fuzzy match, check the box. Set the threshold to a value between 0 to 1.0. This is going be the driving factor to identify probable matches. I set at .90, so the understanding is to consider 90% probable matches. Rest are optional and if required can be checked.
Once the filter is applied the output clearly identifies the “Vincent Lauriant” as a probable duplicate contact. As part of this example I am going to filter out original records and show only the duplicate data. We can perform various operations at this stage to get the desired output. The simplest option I went for is to create index column to filter this data.
I have added a conditional column with a simple validation to filter out the non duplicates. I have used the “contactid” for that.
Apply the filter to only show the duplicate data and our result set is ready. Publish the report to the Power BI online and set a schedule to refresh the dataset on a regular basis.
This report can be published as a Dashboard report with in your Dynamics 365 CRM instance and users can constantly can validate the data.