Many organisations have large amounts of written text. Thematic analysis allows the topics in documents to be summarised, clustered, and analysed.
Embeddings: text as numbers
Large Language Models like ChatGPT have burst onto the scene, bringing Artificial Intelligence (AI) to everybody’s fingertips. Language models generate text in a way that is almost miraculous. In order to carry out this trick, they first need to learn about text by reading it. To process text, language models rely on an underlying technology that translates text into numbers. Called an embedding, this technology represents sentences or paragraphs as numbers, in a way that similar concepts end up numerically close to each other. The embedding provides a mapping from concepts or themes into numbers.
Because of the richness of language, with the same word having different meanings in different contexts, generating this embedding is a tricky process. It needs to be able to understand that while ‘dog’ might often go with ‘canine’, ‘hot dog’ is more likely to be close to ‘hamburger’.
Using embeddings to find themes in text
Many organisations have large amounts of written text: maybe shared drives full of documents, or websites with articles written over many years. It can be difficult to figure out what all these documents are talking about. Manually analysing text to identify themes is a time-consuming and specialist job.
Embeddings allow for themes that are discussed in text to be analysed rapidly and consistently. Because thematic analysis can be carried out over a whole collection in text, this process allows for analysis of changes in the themes over time, by author, through different parts of a business, or in relation to any other metadata that is associated with the documents.
Improving our capability
Over the last few years, Dragonfly has been working on our capability in natural language processing.
The development of language models has stimulated improvement in the skill of embeddings, making them increasingly powerful at recognising the underlying concepts in text. There are many ins and outs with making the embeddings practically useful, and we have been gaining expertise at using embeddings to understand the topics that are referred to in text.
Submissions analysis
One area that we have worked on is using embeddings to analyse data in public submissions to government select committees. This analysis is able to provide a broad overview of the submitters’ concerns. It is complementary to manual approaches, and is useful at the beginning of the analysis process as it can provide a rapid overview of the submissions. This overview is helpful for developing the analytical framework. Like many applications of AI, it is best used as a tool to augment or inform manual processes.
Read more
We recently worked with the Occupational Therapy Board, using thematic analysis to help understand the common themes referenced in practitioners’ portfolios. Read more about our portfolio analysis work here.
Get in touch
To ask about how thematic analysis can help you, feel welcome to get in touch with us hello@dragonfly.co.nz.