Kiarie Ndegwa recently joined Dragonfly from CSIRO in Canberra, Australia, where he used machine learning techniques in diverse settings to solve large, messy data problems. Here, he explains the basics.
“Machine learning is a hot topic right now. It is part of the bigger field of AI (artificial intelligence). AI is a set of algorithms and tools that are used to solve problems such as classifying patterns in images, or recognising speech. The algorithms mimic intelligence, in particular the ability to adapt to new information, to have memory and to make decisions.”
Kiarie says the distinctive feature of machine learning is that it’s data-driven rather than human-driven.
“If we want to find images of cats or dogs in a set of images, for example, we don’t have to create a set of handcrafted rules to teach a machine learning system explicitly what a cat or dog looks like. Instead, we give a set of example images with cats and dogs and let it work out the unique identifying features of those animals itself.”
“In this ‘black box’ system we don’t know what features it will generate to tell cats and dogs apart. But if we could look inside, there would probably be a strong correlation with what we’d use to identify a dog or cat – fur, snouts, ears – as well as other weird non-intuitive features that maybe we don’t understand.
Kiarie explains that ‘deep learning’ is a type of machine learning. “Deep learning algorithms are a very rudimentary mimic of a neural network like a brain. The more neurons and data you have, the better your machine learning will work – it’s like having a bigger brain and more examples to learn from. The ‘deep’ refers to having multiple layers of simulated neurons in an artificial neural network.
The ethical concerns around bias in machine learning are something that Kiarie is aware of, particularly in the ways it has been used by media and social media recently.
“If the dataset is biased because of the way data has been collected, the model will follow that bias and make it more explicit. This can enhance and perpetuate discrimination, so we have to be hyper-aware of ways we might be seeing or creating bias.
“If we’re creating a model that’s based on spoken language, for example, we have to make sure older voices are recognised as well as younger ones, and female as well as male. Otherwise, we will create an output that’s biased towards only one part of a population.”
“At the moment there is lots of experimentation and research going on in the field. I follow some practitioners on Twitter, as well as the mathematicians who are trying to validate the results. It’s an exciting time. Machine learning has the potential to crack some really challenging problems, for example, processing natural language.”
The Dragonfly team is pleased to have Kiarie join them. As Edward Abraham explains,
“There are not many people who are able to keep up with the current literature, and who also have the software engineering skills to implement these machine learning systems. Since starting at Dragonfly, Kiarie has been able to take a complex model and shrink it to get it working on a mobile phone without losing performance. This feature makes the model much cheaper to run, and removes the need to transmit data back to the servers”.