LLMs, Model Governance and Ethic

New Opportunities 

Large language models (LLMs) are a new natural language processing(NLP) tool that has improved performance, reduced model development time, and greatly expanded use cases over prior approaches. A few examples of new use cases are a more humanistic search experience where users can just ask questions naturally, generating marketing material for a wide variety of target audiences instantly, and scanning large bodies of unstructured data like comments to create interactive marketing solutions. While they are a breakthrough in terms of performance, they build on traditional NLP and deep learning trends which means many decision science teams are capable of deploying and developing LLMs with minimal training. You can develop LLM models either using third-party tools, like ChatGPT, or build unique solutions in-house using open-source packages like PyTorch. While LLMs enable new solutions they still should adhere to standard model governance, performance evaluation, and ethical standards.

Wild West

One reason LLMs have caused panic about artificial intelligence is that many of the new use cases mean companies that have little to no experience deploying ethical machine learning/artificial intelligence (ML/AI) models can now create solutions using tools like ChatGPT. Part of the fear is best practices developed in the risk, fraud, and marketing space will not transfer as readily as these technologies. However, these best practices not only assure regulatory compliance but also well-behaving and stable models. Moreover, given the ambiguity of LLM’s final output, without a proper model governance process put in place, it is highly unlikely anyone could develop a best-in-class solution and will more likely create an unstable and difficult-to-maintain one. Often with ML/AI solutions a well-defined model governance process is your competitive advantage, not the tool itself. 

How to Start

As with any statistical or ML/AI solution, a robust governance process needs to be put in place to assure compliance and performance and that the solution adheres to your ethical standards. 

The first step is defining your code of ethics involving ML/AI or, if your company has one, revisiting it to see if the new use cases LLMs enable warrant any changes or updates. Typically a code of ethics would have five to six straightforward rules. Example rules are: do no harm and all AI-powered solutions put the user first. While these may seem like just words, a well-crafted code of ethics can steer innovators in the right direction early in a solution’s development. 

Next, what is the goal of the application of LLM?

  • Entertainment: While entertainment seems to be the least risky there are threats such as copyright infringements, inappropriate outputs, and noncompliance with local regulations that require some degree of model governance. Also, LLM-generated material might amplify the worst elements within the corpus (e.g. social media posts)  it ingests perpetuating negative stereotypes. 
  • Information retrieval: Like entertainment, incorrect information can be embarrassing but also may open you up legally. There are incidences where a chatbot recommended people commit suicide when a user said they were depressed. Information retrieval can lead to real-world harm faster and more horrifically than risk and fraud.
  • Risk models: Risk has the highest degree of scrutiny in terms of regulations. While this sounds like a barrier to success, these regulations often help modelers build better and more stable models.   If using LLMs for fraud models you need to be particularly careful.
  • Fraud: Also heavily regulated, fraud models are of growing concern because fraud mitigation can act as barriers to access to disenfranchised groups. This is especially true of poorly built models that use greedy techniques that will readily exclude subpopulations.
  • Marketing: Growing regulations and public attention put a greater need for model governance of marketing models. Like fraud models, marketing models can act as barriers to access. Some poorly built models have been known not to show ads or offers to subpopulations restricting these populations’ access to discounts and opportunities solely based on their demographics. 
  • Biomedical: ML/AI models have always played a major role in biomedical innovation. While most biotech firms have robust model governance in place, the emergence of LLM may warrant revisiting those and also assuring units don’t bypass rules via LLM APIs.

Next, what is the legal and competing landscape?  Are you entering a well-entrenched space? How many different legal jurisdictions are you entering and how are they similar? Are there already informal ethical standards? Could higher ethical standards give you an edge? These are just a few questions to ask when mapping out a strategy.

Finally, Risk and Technology, taking into consideration use cases,  ethics, and the competitive and legal landscape work together to define best practices and roles and responsibilities.  A well-defined process with a socially acceptable code of conduct will accelerate development time, reduce internal friction and produce superior, regulatory-compliant results. 

Why Does This Matter? 

To show why, let’s propose a terrible idea and see how it plays out. Suppose you scan applicants’ social media posts and then use an LLM to generate features for a risk model. Suppose the model showed good performance and was put into production. A year goes by and an applicant denied by your model requests to know why. After an explanatory tool such as SHAP, you find out the driving feature came from the LLM features and, regrettably, they were hallucinations ( a common LLM issue). This means the information in those features was not directly associated with the applicant but something the LLM invented. In truth, the model had no fact to warrant the high-risk score it gave and the applicant was denied access for no reason. If, when you read this scenario, you have no idea whether your processes would have stopped this, you may need to review your model governance procedures.

 

Focused on Success Not Trends

A common misperception is that model governance drives the failure of models to get into production. In truth, not having model governance”s input sooner in the product development phase is more often the cause. During the deep learning craze, the most common failure I saw was my clients forcing deep learners into their solutions. Curious, this was driven by both technology and product focused on the tools used rather than the final product’s performance. ML/AI techniques are tools, not solutions. I saw completely stable, market-leading products get revamped to deep learners then falter painfully in production simply because decision-makers got caught up in the hype. Likewise, it may be that LLMs are not what you need but some other more straightforward tool that is also easier to govern. If set up correctly, the model governance process can raise these concerns early in the development process.

Turning Compliance into an Advantage

With most ML/AI solutions how it is tested is as much its secret sauce as the algorithms used. The solution’s target audience, service level agreement (SLA), and type of service it provides define the test methods and procedures. Some of these methods follow best practices and can even use standard packages, but others may be more innovative. The following are quick examples of various innovative test procedures for an information retrieval LLM solution that if properly executed could provide market differentiation:

  • Build an adversarial AI whose goal is to get the solution to tell a falsehood.
  • Manually identify topics or chains of topics that are confusing or controversial and use them to drive automated tests.  
  • Create interactive scripts to guide participants on mechanical turk to test your solution manually.

It is evident that how you test dictates how the product interacts with consumers and this interaction is what will define the product’s success or failure.

Built on Thin Ice?

Remember, stability is just as important as performance (as is evident from the growing concern about ChatGPTs own stability). The goal is not just to develop a model but to deploy a model to production where it meets or even exceeds the predefined SOW for the solution. Model governance, which separates model development from final model testing using a  near-production environment, forces discussion about deployment, data issues, and concerns such as concept drift to the beginning of the model development phase thus reducing the chance of creating a model that would never function in production.

Rules of Thumb 

Here are a few parting words of wisdom:

  • If you don’t have a code of ethics for ML/AI develop one.
  • Get model governance involved early in the development phase of a new project.
  • Focus less on tools (such as deep learners or LLMs) and more on outcomes.
  • Assure that the innovation occurring in testing is elevated within your organization.

Parting Word

LLMs, like other new ML/AI tools, extend human capabilities leading to greater innovation, new opportunities, and… risks. While model governance and codes of ethics may seem like an extra cost they are a way to accelerate model deployment, enhance product differentiation, and adhere to ethical standards. LLMs build on prior ML/AI techniques; while they offer tremendous opportunities for product innovation, they do not bypass the need for clear ethical guidelines and strong model governance. This is an exciting time for industries that typically used little to no ML/AI to start enjoying the benefits they offer, as long as best practices are adopted. 

Resources