AI Digest - March 24

What matters each month

Mar 25, 2024

Do I have your attention?

This month I have been thinking about how we share information and what helps busy people to take note. Perhaps this is because - increasingly - the internet is awash with AI-generated prose which doesn’t quite hit the mark. Perhaps it’s because a project team I’m part of has just completed a 100 page report, where the majority of readers will (quite sensibly) read only the Executive Summary. Why we write feels a crucial question and one which grows in importance commercially, for us at Paradigm Junction and most of our clients. Ultimately I write this aloud, rather than in private, as an attempt to help organisations and the individuals who lead them change their behaviour, to adapt and evolve. New ways of communicating and learning are emerging though. Perhaps some of you already put this into ChatGPT and ask for the most relevant points to you (if you do this, I’d love to hear!) whilst others will simply skim the If You Only Read One Thing. (If you’re wondering, everyone else clicks most frequently on the Lighter Side links too!)

Whatever the answer, the information environment for all of us is changing. I hope you still find this “so what” email useful. As ever, comments, feedback and questions are gratefully received.

Yours, as yet unmediated by AI,

James

If You Only Read One Thing
What Is GenAI Good For?
- Customer Service
- User Focus
- Software Engineering
- Medicine
How To Successfully Integrate GenAI With Existing Organisations
- Build Systems Not Models
- Solve Real Problems
- Prepare For Everyone To Have AI
- Great Work In Government: FCDO
- More Work In Government
Our Recent Work
- Introduction To Futures And Foresight
- Fundamentals Of AI For HR
Zooming Out
- System Prompts - Writing Your Own Instructions
- LLMs Double In Capability Every 8 Months
- China
Learning More
- Newsletters
- Groups & Meetups
- Training Courses
The Lighter Side

If you only read one thing

So much conversation about AI is focused on what you should do with it now. How might it help you do the work you do better? What would a good or bad use case look like? What are your competitors doing that you might need to catch up with?

This is important to think about, but it’s not where strategy should stop. Looking ahead, as generative AI becomes more widespread and our understanding of how best to use it solidifies (even as capabilities continue to grow), we’ll enter a new period of dynamic equilibrium. Better to make strategy based on what that future might look like, than on the basis of what it seems you might be able to do today.

It’s too early to predict how things will look once the dust has settled, but there are a few emerging possibilities which are worth considering, especially from a comms and product development perspective:

World 1: Limitless, quality AI-generated content and output, to the point that, whatever your field, your messaging is no longer sufficient to stand out. LinkedIn Messages. Flooding The Zone.
World 2: A flood of fluff, with AI generating poor quality content but in such volume that finding your messaging is hard. AI Crisis in Science. Improved Output But Not Outcomes.
World 3: AI content offers something new, not competing with your abilities on your own terms (it never designs as good a product, solves a problem as comprehensively or writes as compelling a newsletter) but delivering a different, high-quality experience which people find more compelling and so pay attention to instead. Your Pending Competitive Inadequacy.

Each of these worlds is one in which it is harder to cut through. Going forward, how do you speak to your customers, or prospective employees, when AI-generated content is competing for their attention? How do you build and sustain relationships and trust? These questions were apparent 9 months ago but now become more pressing.

What would it look like to deliver a compelling offer that stands out from the good-enough AI crowd? How might you carve out a niche in the flood of fluff that lets you still be seen? How might you work with or around the newer AI experiences in a way that keeps what you do relevant?

What Is GenAI Good For?

Customer service, within certain bounds. We wrote in January that risk of data loss and poor UX had hit enthusiasm for applying AI to customer service. The recent Air Canada debacle, where its chatbot invented a refund policy which the airline now (obviously) needs to honour, might look like another point for the case against. But in the last month, we’ve also seen Klarna’s AI assistant handling two thirds of all customer requests, equivalent to the work of 700 agents, in 35 languages, at greater speed and with higher accuracy than human employees. So yes, unconstrained systems might be bad news. But embedded as a triage mechanism, solving easy problems and appropriately constrained (Klarna used their ChatGPT-powered bot for easy queries, sending harder ones to humans and creating new documentation so the AI could learn from how those were solved), GenAI can have a massive impact. See ‘Good Work in Government’, below, for another good use case. Klarna. Air Canada. January Newsletter.

Really tailoring your offer to your user. This is most obviously the case in marketing: WPP’s recent investor day featured discussion of training models on target demographics, and check out the car advert which was personalised for 1.3m different people. But the most exciting use case we’ve come across this month was for simulated user research during product development. It used to be really expensive to get the user in the room early, but you can now have a pretty good copy of them there all the time, in the form of a GPT persona that you can have a conversation with. Suddenly, exploring how different people might respond to a product change, or how a service might be made more useful for them, becomes much easier. WPP. Car Advert. Persona Conversations.

AI for software engineering just got a lot better. Still only able to do ~14% of coding tasks unassisted, but that’s a huge improvement on 2-5% for ChatGPT and Claude. Devin.

Real life issues in subfields of medicine. Research on applying LLMs to all manner of more niche problems across fields continues apace. Papers On ChatGPT In Radiology.

How To Successfully Integrate GenAI With Existing Organisations

Build for a world in which everyone else has AI. Remember the system you’re building is one in which users have AI too. Our most common piece of advice for companies we work with is to think less about “what can I do with AI” and more about “what do I need to change because of what people can use AI to do to me”. See, for instance, the rise in AI scams on Tinder. Tinder Expands Identity Verification.

Build systems not models. How do you stop your AI powered chatbot from saying harmful things? One option is to try and write a prompt to fix it, but as Google found out this month, that doesn’t work very well. A more radical, and perhaps more effective answer is to not let your AI send any messages to your user at all. People commonly associate LLMs with chatbots, but the reasoning capabilities of an LLM can be used without them coming into direct contact with the end user. Perhaps those capabilities just determine which pre-written templates get sent, or their output goes through several layers of filtering? Perhaps you check user intent as an explicit step rather than responding directly? (See FCDO example below for a case study on this) Gemini.

Build systems which actually solve problems. There’s lots of thinking going on about “using AI” and arguably not enough about what problem you can actually solve with it in a novel way. AI teams need to understand their users to build useful products, starting with an identified need and building to address that. AI Teams Need Designers. A Product Manager’s View of AI.

Great work in government. This month we spent an afternoon with Richard Grove and David Gerouville-Farrell at Caution Your Blast Ltd who, alongside the Foreign, Commonwealth and Development Office, are building (likely) the first LLM-powered, publicly accessible service in central government. There are lots of great takeaways from the project - and best of all they are building in the open, so you can follow their progress (both successes and setbacks) through their weekly video blogs. They’ve also set up an in-person community to catalyse others in government seeking to solve similar problems and share design patterns. A few highlights:

A clear problem identification: Contact centres get overwhelmed at times of high request volumes (e.g. ‘crisis moments’ such as Russia-Ukraine, Middle East). These spikes in demand are very difficult to predict and often create huge backlogs of day-to-day work for the FCDO teams.
A clear failure mode to avoid: The LLM-powered service must not produce embarrassing or harmful responses, even under adversarial attacks.
Led to a really clear design decision: They use the LLM to comprehensively understand user queries and to select the best answer, but that answer is chosen (by the LLM) from a collection of prewritten (by humans) answers for the most common requests. This means no LLM-produced outputs are ever sent to the customer and it is impossible for hallucinations or adversarial attacks to lead to dangerous outcomes.
The list of highlights goes on… From inception a designer, user researcher and data analyst were integrated into the team ensuring user satisfaction; The team worked with staff to understand how their jobs change post-launch and ensured (pre-launch) that system maintenance is prepared for; Users will have the option to opt out of automated processing, reverting to the previous (non-AI) service is they desire.

The service is estimated to save ~12,000 staff hours over the next 5y. But this is just the start, with both technical and organisational capacity to run LLMs in place, FCDO will be well prepared to take on other projects as they are identified or as model performance increases.

The team have spent months working with contact centre agents themselves to capture and record the business logic that exists, so the system works how the FCDO team wants it to. They have even reused many of the templates that staff were already manually using! A great example of building a system that leverages AI, rather than starting with a model and finding a problem to solve (see Build Systems, Not Models above).

We are looking forward to seeing the project launch next month. Announcement. Video Diaries. Meet Up.

More work in government

NAO report on use of AI in government finds 74 deployed use cases, with barriers around skills. NAO. Analysis.
TfL uses AI to monitor CCTV camera footage in a tube station to spot aggression, fare evasion and more. AI Tube Station.
A meet-up for people interested in exploring applications of AI in government! Meet-up.

Our Recent Work

Paradigm Junction Partner Lewis wrote an explainer to Futures and Foresight - this is the crux of how we help organisations to make decisions in the face of highly uncertain outlooks. Link.

Both of us also spoke at Legal Islands Fundamentals of AI for HR conference this month - on Change Management and Skills. I know they took recordings so contact the organisers if you’d like to watch this (or any of the other brilliant sessions). Link.

Zooming Out

AI Chatbots (like ChatGPT) are controlled with a system prompt that gives additional information to the model about how to respond (like the date, its purpose in speaking to you). Anthropic, who make ChatGPT competitor Claude, released and discussed theirs. Note how the instructions are simply good, clear english communication. A good model for how to instruct your own AI-tools. Link.

The much discussed Moore’s Law proposes that computers double in capability every two years. We now have quantitative estimates for how quickly the Large Language Models underpinning Generative AI tools are improving - doubling every 8 months. That means getting 8x better in two years, rather than 2x. The AI tools you experience now are the worst you’ll ever use. Link.

File in the mental folder: keeping up with what China is doing on AI. Link.

Learning More

Ethan Mollick (who we regularly link to) has produced a Prompt Library and set of resources for educators (he teaches at The Wharton School). Link.

London based Civic AI Observatory wrote their latest newsletter on AI-productivity improvements. They focus on helping charities and NGOs but this is the best overview of AI-additions to Word/Email/Presentations that I have found anywhere. Link

A good technical explainer from Georgetown University for those looking to get to grips with how LLMs work and are developed. Link.

The Lighter Side

If AI-writers and AI-reviewers are “peers” then peer review at scientific journals is holding up well. Link.

Move over “Instagram vs Reality”, “AI Generated Marketing vs Reality” is the new Golden Ticket. Link.

Is it me or are university courses getting easier? Link.

Thankfully Twitter is still a home of productive political debate. Link.

If you thought ChatGPT was getting too sanctimonious may I introduce you to the Headboy of chatbots: Goody2. Link.

TL;DR. Link