What to do with those pesky Unicode characters in Business Central

In this video, I take a look at how strange Unicode characters behave in Microsoft Dynamics 365 Business Central. Unicode characters in Code fields can often be a challenge and using only ASCII characters makes filter, Odata and many other things much easier.

As a bonus, I end up breaking Business Central, simply by creating customers with alternative characters in the Customer No. field, see for yourself in the video.

https://youtu.be/cr9UCF-jEz0

In this video, Erik explores the surprising and sometimes breaking effects of Unicode characters in Business Central code fields. What starts as a demonstration of emoji customer numbers and lookalike characters turns into an accidental discovery that certain Unicode characters can actually break Business Central’s UI. Erik then walks through how to build a validation guard to restrict fields to safe ASCII characters only.

The Problem: Unicode Characters in Code Fields

The inspiration for this video came from a real support incident where a customer ended up with a fancy (curly) quotation mark inside an item number. Everything broke because nothing in the system expected a Unicode character like that in a code field.

To demonstrate the scope of the problem, Erik created several customer records with unusual “No.” values:

  • A Santa Claus emoji 🎅 as a customer number
  • A dino-cat emoji as a customer number
  • A tilted double-quote character that looks almost identical to a normal one
  • A backwards “Erik” using special Unicode characters
  • Three separate customers all appearing to be named “KAREN” — but each using a different Unicode version of the letter K

The “KAREN” example is particularly devious. All three records appear to have the same primary key, which should be impossible in Business Central. The trick is that each “K” is actually a different Unicode character — one is a standard Latin K, another is the Kelvin sign (K), and a third is the potassium symbol from the periodic table. They look identical but have completely different code points.

An Accidental Discovery: Breaking Business Central

Things took an unexpected turn during the demo. When Erik tried to interact with these Unicode-laden records — deleting one, selecting another — Business Central started duplicating Karen records uncontrollably. Selecting a record would create another copy, until there were five “KAREN” customers. The UI effectively broke under the weight of these lookalike Unicode primary keys.

This was not planned, making it an even more compelling argument for guarding against rogue Unicode characters in code fields.

Inspecting Unicode Values

To understand what’s going on under the hood, Erik built a page extension with an “Inspect No.” action that iterates through each 16-bit word in the customer number and displays its numeric Unicode value:

pageextension 56700 "Customer List" extends "Customer List"
{
    actions
    {
        addfirst(processing)
        {
            action(inspect)
            {
                caption = 'Inspect No.';
                ApplicationArea = all;
                trigger OnAction()
                var
                    i: Integer;
                    charNo: Integer;
                    str: Text;
                begin
                    for i := 1 to strlen(Rec."No.") do begin
                        charNo := Rec."No."[i];
                        str += format(charNo) + ' - ';
                    end;
                    message(str);
                end;
            }
        }
    }
}

Key observations from the inspection:

  • The Santa Claus emoji is encoded in two 16-bit words: 55356 and 57221
  • The dino-cat emoji takes up five 16-bit words — it’s a much more complex encoding
  • Standard ASCII characters like A (65), R (82), E (69), N (78) are all below 255
  • The lookalike K characters have values like 8490 and 922 — well outside the ASCII range

A Quick Note on Unicode Encoding

Unicode is fundamentally 16-bit, which gives you about 65,000 possible characters. However, Unicode uses a surrogate pair mechanism where characters outside the basic plane are encoded using multiple 16-bit words. This is why emojis like Santa Claus take up two words, and more complex combined emojis (like the dino-cat) take up even more. When you index into a string in AL using array notation (Rec."No."[i]), you’re accessing individual 16-bit words, and assigning one of those to an Integer gives you the raw Unicode code point value.

The Solution: Validating Code Fields

To prevent these problematic characters from getting into code fields in the first place, Erik created a table extension that validates the “No.” field on the Customer table. The approach uses AL’s in operator with character ranges to ensure only standard alphanumeric characters are accepted:

tableextension 56700 "Customer" extends Customer
{
    fields
    {
        modify("No.")
        {
            trigger OnBeforeValidate()
            var
                i: Integer;
                charNo: Integer;
            begin
                for i := 1 to strlen(Rec."No.") do begin
                    charNo := Rec."No."[i];
                    if not (Rec."No."[i] in ['A' .. 'Z', 'a' .. 'z', '0' .. '9']) then
                        error('Only A-Z 0-9 allowed');
                end;
            end;
        }
    }
}

The key technique here is the in operator with range syntax. The expression Rec."No."[i] in ['A' .. 'Z', 'a' .. 'z', '0' .. '9'] checks whether each character falls within the standard ASCII ranges for uppercase letters, lowercase letters, or digits. Any character outside these ranges — whether it’s an emoji, a curly quote, or a lookalike letter from another Unicode block — triggers an error.

With this validation in place, attempting to create a customer with a heart emoji or any other non-ASCII character results in the error message “Only A-Z 0-9 allowed,” while standard alphanumeric entries like “1234” pass through without issue.

Why This Matters

You might wonder why Unicode characters in code fields are a problem if Business Central technically allows them. Erik highlights several real-world concerns:

  • CRM Integration: What happens when you need to sync this customer to Dynamics 365 CRM or another system?
  • Web Services: API consumers may not handle exotic Unicode characters gracefully
  • Filtering: How do you filter on a customer number that contains a star (*) or other characters used in filter syntax?
  • Data Integrity: As the “KAREN” example showed, lookalike characters can create records that appear to be duplicates but technically aren’t — confusing users and potentially breaking the UI
  • Support Burden: As the original support incident demonstrated, a single curly quote in an item number can cascade into widespread breakage

Erik recommends staying away from Unicode characters in code fields entirely, and even avoiding characters that are commonly used as filter syntax (like *, ?, .., etc.) to keep your data model clean and your integrations working smoothly.

Conclusion

Unicode characters in Business Central code fields are more than just a curiosity — they can cause real problems ranging from confusing duplicates to outright UI breakage. By adding a simple OnBeforeValidate trigger that checks each character against an allowed set using AL’s in operator, you can prevent these issues before they happen. This is especially important in today’s connected world where data flows between Business Central, CRM systems, web services, and other integrations that may have their own character handling quirks.