Can I add ChatGPT to my ToolBox App to help write code?

This weekend I have been playing with the idea of adding AI code generation and completion to my ToolBox App. It seems to be a natural fit, but how does it actually work? In this video, I’ll show you where I am and how it works:

https://youtu.be/F26bVAITbpg

In this video, Erik explores whether it’s feasible to integrate ChatGPT (via Azure OpenAI) into his BCALToolbox app — an AL code interpreter that runs directly inside Business Central — to provide AI-assisted code generation. He walks through the Azure OpenAI setup, the AL integration code, and a series of live experiments showing both the promise and the significant challenges of getting a large language model to produce valid AL code.

What is the BCALToolbox?

If you’re not familiar with Erik’s Toolbox, here’s the quick pitch: it’s an AppSource app for Business Central that lets you write and run AL code directly inside Business Central — no extension deployments, no sandbox round-trips. You go in, write some code, click Run, and it executes. Erik built an AL compiler/interpreter written in AL itself (yes, really), and there are several earlier videos on the channel covering that journey.

The Toolbox is designed for quick fixes and ad-hoc tasks: get in, do what you need to do, and get out — without the overhead of creating extensions, managing permissions, and deploying through the full pipeline.

Why Add AI to the Toolbox?

With large language models and code generation tools gaining momentum, Erik didn’t want to be left behind. The Toolbox, as a tool where developers write AL code interactively inside Business Central, seemed like a prime candidate for AI assistance. The idea: let the AI help generate or complete AL code snippets right where you’re working.

Setting Up Azure OpenAI

Erik deployed a code-focused model (the Davinci Codex 003 variant, a GPT-3 model targeted at code generation — similar to what GitHub Copilot uses) through Azure OpenAI. Azure provides access to OpenAI’s models, and Erik wired up a completions endpoint to his Business Central environment.

The AL Integration Code

The integration is surprisingly compact. Erik created a new codeunit with a function called OpenAI that takes an input string (the prompt) and returns the AI’s response. The entire thing is about 62 lines of AL code — essentially a web service call with JSON request/response handling.

Here’s what the request JSON includes:

  • Prompt — whatever text the user has typed in the Toolbox editor
  • Max tokens — the maximum length of the response
  • Temperature — controls creativity/randomness (lower = more deterministic)
  • Top P (top probabilities) — another parameter controlling how the model selects words
  • Frequency penalty — prevents the model from getting into loops and repeating the same code
  • N (number of suggestions) — set to 1, asking for a single completion
  • Stop sequences — special tokens to tell the model when to stop generating (Erik found the model would sometimes generate code and then start writing documentation in Markdown or HTML)

The response handling grabs the choices array from the returned JSON, takes the first element (index zero), extracts the text, and returns it. Simple and clean.

This “Get AI Help” function is then wired into the Toolbox UI — you type something, click the button, and the AI tries to complete your code.

Live Experiments: The Good, The Bad, and The Pascal

Erik ran a series of increasingly specific prompts to see what the model would generate. The results were… educational.

Experiment 1: Empty Prompt

With no input at all, the model returned Java code — a UserController class from a com.example.demo.controller package. Apparently, that’s the most statistically likely code anyone could write, according to this model. Not AL, not useful.

Experiment 2: “Procedure”

Typing just “procedure” returned Delphi/Object Pascal code — form creation and destruction procedures, complete with the Pascal-style end. (end-dot) syntax. Close in spirit to AL’s ancestry, but not what we need.

Experiment 3: “AL language”

Still Pascal, but now with Android references. Getting warmer? Not really.

Experiment 4: “AL language for Microsoft Dynamics 365 Business Central”

Still some weird Pascal variant. The model wasn’t picking up on the AL language context from description alone.

Experiment 5: “Loop all customers”

This was the breakthrough moment — the model actually generated AL code! The domain-specific request (“loop all customers”) was apparently specific enough for the model to identify that this is a Business Central context and produce AL syntax.

Experiment 6: “Loop all devices”

Even with a non-standard BC entity, the model continued generating AL code. The customer-focused context had stuck.

The Quality Problem

While the code looked like AL, it had significant issues:

  • Used FindNext instead of the correct Next method
  • Generated infinite loops because Next was called incorrectly
  • Failed to use the proper repeat...until loop pattern for database records
  • Never declared variables — completely omitting the var section
  • Added unexpected filters and set ranges

Prompt Engineering Attempts

Erik tried several strategies to improve the output:

Providing Schema Information

Following documentation suggestions, Erik added table schema context to the prompt — defining that the Customer record has fields like Number, Name, Name 2, Address 2, and City. This actually helped improve the output somewhat.

Adding Language Rules

He tried adding instructions like “database loops should always use repeat…until” and “variable declaration should happen in a var section at the top of a procedure.” The variable declaration hint in particular seemed to improve results, possibly because without it the model was getting confused about the overall code structure.

Mixed Results

Even with these improvements, the results were inconsistent. The model would sometimes add correct-looking filters, sometimes add nonsensical ones, and the core loop logic remained unreliable. Running the same prompt multiple times produced slightly different (but similarly flawed) results.

Ideas for Improvement

Erik outlined several ideas for making this more useful going forward:

Compile-and-Iterate Loop

Since the Toolbox includes an AL compiler, Erik could take the AI-generated code, compile it, and if it fails, send the code along with the error message back to the model and ask it to fix the error. This could iterate until the code compiles successfully or a reasonable limit is reached. The catch: compilable code isn’t necessarily correct code (as demonstrated by the infinite loop example).

Pre-populating with Symbols

Another approach would be to auto-generate schema information by pre-compiling and extracting all table definitions, function signatures, and object metadata from the environment, then injecting that context into the prompt. This would give the model much better information about the available AL objects and their structures.

Training a Dedicated Model

The nuclear option: take the entire base app, system app, and all custom apps, and fine-tune a dedicated AL model. Erik acknowledged this would be significantly more work and impact, and expressed a preference for the lighter-touch approaches first.

About the Repository

The BCALToolbox is an open-source project hosted on GitHub, licensed under MIT. The build pipeline compiles against multiple Business Central versions and localizations:

strategy:
  matrix:
    'Compile Against W1 - Current Version':
      artifactCountry: 'w1'
      artifactVersion: 'Current'
    'Compile Against W1 - Next Minor':
      artifactCountry: 'w1'
      artifactVersion: 'NextMinor'
    'Compile Against W1 - Next Major':
      artifactCountry: 'w1'
      artifactVersion: 'NextMajor'
    'Compile Against United States - Current Version':
      artifactCountry: 'us'
      artifactVersion: 'Current'
    'Compile Against United States - Next Minor':
      artifactCountry: 'us'
      artifactVersion: 'NextMinor'
    'Compile Against United States - Next Major':
      artifactCountry: 'us'
      artifactVersion: 'NextMajor'

The workspace is configured with multiple code analyzers enabled, including CodeCop, UICop, and PerTenantExtensionCop, with compilation set to fail on warnings:

"settings": {
    "al.enableCodeAnalysis": true,
    "al.backgroundCodeAnalysis": true,
    "al.codeAnalyzers": [
        "${CodeCop}", "${UICop}", "${PerTenantExtensionCop}"
    ]
}

Conclusion

This was an honest, weekend-hacking exploration of integrating AI code generation into a Business Central development tool. The integration itself is straightforward — just 62 lines of AL to call Azure OpenAI’s completions endpoint. The hard part is getting the model to produce valid, useful AL code.

The key takeaways:

  • Generic prompts produce generic (non-AL) code — you need domain-specific prompts to get AL output
  • Even when the model produces AL-looking code, it often contains subtle but critical errors
  • Providing schema information and language rules in the prompt helps, but isn’t sufficient on its own
  • The compile-and-iterate approach is promising but doesn’t solve logical correctness
  • Pre-populating prompts with symbol/metadata information from the actual BC environment may be the most practical path forward

This is clearly a long way from being useful to anyone in production, but it’s a fascinating starting point. Erik invited the community to share feedback — what’s promising, what’s a dead end, and what approaches might work better.