Web Scraping for fun and profit in AL

In this video, I show how to use the ancient art of webs scraping to extract useful information from a normal webpage, check it out:

https://youtu.be/_Rxmhk2IQmo

In this video, Erik demonstrates how to use web scraping techniques directly from AL code in Business Central. The use case: programmatically checking the latest version of an app on AppSource without needing any authentication. Along the way, he walks through making HTTP requests, parsing raw HTML, and using regex to extract structured data — all within AL.

Why Web Scraping?

These days, most integrations in Business Central revolve around REST APIs, SOAP services, and OData endpoints. We use HttpClient all the time to call well-structured web services. But it’s easy to forget that under the hood, it’s the same HTTP protocol that serves regular web pages.

Sometimes the data you need isn’t sitting behind a nice API endpoint. Maybe you need to extract information from a public web page — like checking the current version of an app on AppSource. That’s where an old-school technique comes in: web scraping. You call out to a URL, download the raw HTML, and parse through it to find what you need.

The Use Case: Getting an App Version from AppSource

Erik’s goal is simple: given an AppSource listing (like the SharePoint Connector or the Cloud Replicator), can we programmatically extract the app’s version number? If so, you could build a service that checks whether users are running the latest version and prompts them to update.

The version number is visible on the AppSource page, but there’s no public API to retrieve it. So we scrape it.

Step 1: Fetching the Page

The first step is straightforward — use AL’s HttpClient to make a GET request to the AppSource URL and read the response body as text:

var
    httpClient: HttpClient;
    Response: HttpResponseMessage;
    RawHtml: Text;
begin
    if httpClient.Get('https://appsource.microsoft.com/en-us/product/...', Response) then begin
        if Response.IsSuccessStatusCode() then begin
            Response.Content.ReadAs(RawHtml);
            // Now RawHtml contains the entire page source
        end;
    end;
end;

With just a few lines, we’ve downloaded the entire HTML source of the AppSource page into a text variable. No authentication required — this is a publicly accessible page.

Step 2: Building the Regex Pattern

Once you have the raw HTML, you need to find the version number within it. Erik uses regex101.com as an interactive tool to build and test regex patterns against the actual HTML content. This is his recommended workflow: paste in the source, experiment with patterns interactively, and only move the final expression into code once it’s working.

Looking at the raw HTML, the version number appears inside a JSON-like structure embedded in the page:

"AppVersion":"2.0.0.122"

The regex pattern needs to:

  1. Find the "AppVersion":" prefix
  2. Capture the version number (four groups of digits separated by dots)
  3. Match the closing double quote

The final pattern looks like this:

\"AppVersion\":\"(\d{1,5}\.\d{1,5}\.\d{1,5}\.\d{1,5})\"

Breaking this down piece by piece:

  • \"AppVersion\" — matches the literal text "AppVersion" (backslashes escape the double quotes)
  • : — matches the colon separator
  • \" — matches the opening quote of the value
  • ( — starts a capture group
  • \d{1,5} — matches 1 to 5 digits (one segment of the version number)
  • \. — matches a literal dot (backslash needed because . is a wildcard in regex)
  • ) — closes the capture group
  • \" — matches the closing quote

Erik makes an important point about regex readability: yes, it looks intimidating at first glance, but when you walk through it piece by piece, each part is logical. The parentheses create a capture group around just the version number, so you can extract it separately from the full match.

Playing with Groups

You can also get creative with capture groups. If you wanted each version segment separately, you could add parentheses around each digit group:

\"AppVersion\":\"((\d{1,5})\.(\d{1,5})\.(\d{1,5})\.(\d{1,5}))\"

This gives you five groups: group 1 is the full version string, and groups 2–5 are the individual major, minor, build, and revision numbers.

Step 3: Using Regex in AL

Business Central provides built-in regex support through the Regex codeunit along with temporary Matches and Groups table records. Here’s the complete solution:

pageextension 50100 CustomerListExt extends "Customer List"
{
    trigger OnOpenPage();
    var
        httpClient: HttpClient;
        Response: HttpResponseMessage;
        RawHtml: Text;
        Regex: Codeunit Regex;
        Matches: Record Matches temporary;
        Groups: Record Groups temporary;
    begin
        if httpClient.Get('https://appsource.microsoft.com/en-us/product/dynamics-365-business-central/PUBID.efoqus-5058796%7CAID.replicator%7CPAPPID.18a4d438-88d1-44ee-a38d-3c0ea0d77338?tab=DetailsAndSupport', Response) then begin
            if Response.IsSuccessStatusCode() then begin
                Response.Content.ReadAs(RawHtml);
                Regex.Match(RawHtml, '\"AppVersion\":\"(\d{1,5}\.\d{1,5}\.\d{1,5}\.\d{1,5})\"', Matches);
                if Matches.FindSet() then
                    repeat
                        Regex.Groups(Matches, Groups);
                        Groups.Get(1);
                        Message('Version is %1', CopyStr(RawHtml, Groups.Index + 1, Groups.Length));
                    until Matches.Next() = 0;
            end;
        end;
    end;
}

Key things to note about the AL regex implementation:

  • Regex.Match() takes the input text, the pattern, and populates the Matches temporary record
  • You iterate through matches using standard AL record navigation (FindSet / Next)
  • Regex.Groups() extracts the capture groups from a given match
  • The groups give you an Index and Length — these are zero-indexed positions, while AL’s CopyStr is one-indexed, hence the + 1
  • You use CopyStr(RawHtml, Groups.Index + 1, Groups.Length) to extract the actual text from the source

Caveats and Risks

Web scraping is inherently fragile. Erik is upfront about the risks:

  • It will break if Microsoft changes the AppSource page structure, renames the JSON field, or stops embedding the data as JSON
  • No error handling is shown in this demo — in production code, you’d want robust handling for failed requests, missing matches, and unexpected page formats
  • This approach works best for scenarios where there’s no proper API available and you need unauthenticated access to public data

That said, the advantage is clear: you can grab data from any public web page without needing API keys, OAuth tokens, or any authentication setup.

Summary

In roughly 10 lines of meaningful code (with no error checking), Erik demonstrates a complete web scraping solution in AL that:

  1. Makes an HTTP GET request to a public web page using HttpClient
  2. Reads the raw HTML response into a text variable
  3. Uses the built-in Regex codeunit to find a version number embedded in JSON within the page
  4. Extracts the value using capture groups and CopyStr

The workflow tip to take away: always build and test your regex patterns in an interactive tool like regex101.com before putting them into AL code. It makes the process far more manageable and helps demystify what can otherwise look like an unreadable string of symbols.