Andrew Markham

My research falls in the broad domain of Cyber Physical Systems which spans Computer Science, Electronic Engineering and Physics. I invent new sensors, develop innovative signal processing algorithms and integrate these into systems to solve challenging, real-world problems. My work is highly cross-disciplinary, with long-standing collaborations with other fields e.g. Zoology and Earth Science.

News

May 25, 2021	TPAMI paper accepted!
May 8, 2021	ICML paper accepted!

Selected publications

DeepTIO: A deep thermal-inertial odometry with visual hallucination

Saputra, Muhamad Risqi U, Gusmao, Pedro PB, Lu, Chris Xiaoxuan, Almalioglu, Yasin, Rosa, Stefano, Chen, Changhao, Wahlström, Johan, Wang, Wei, Markham, Andrew, and Trigoni, Niki

IEEE Robotics and Automation Letters 2020

Abstract arXiv PDF

Visual odometry shows excellent performance in a wide range of environments. However, in visually-denied scenarios (e.g. heavy smoke or darkness), pose estimates degrade or even fail. Thermal cameras are commonly used for perception and inspection when the environment has low visibility. However, their use in odometry estimation is hampered by the lack of robust visual features. In part, this is as a result of the sensor measuring the ambient temperature profile rather than scene appearance and geometry. To overcome this issue, we propose a Deep Neural Network model for thermal-inertial odometry (DeepTIO) by incorporating a visual hallucination network to provide the thermal network with complementary information. The hallucination network is taught to predict fake visual features from thermal images by using Huber loss. We also employ selective fusion to attentively fuse the features from three different modalities, i.e thermal, hallucination, and inertial features. Extensive experiments are performed in hand-held and mobile robot data in benign and smoke-filled environments, showing the efficacy of the proposed model.
CARACAL: A versatile passive acoustic monitoring tool for wildlife research and conservation

Wijers, Matthew, Loveridge, Andrew, Macdonald, David W, and Markham, Andrew

Bioacoustics 2021

Abstract HTML Github

Acoustic localisation technology has been widely tested and applied for passive acoustic monitoring and ecological research, however, hardware costs of commercially available devices limit scalability. Furthermore, few studies have explored its use with low-density arrays. We present a low-cost, custom-designed hardware and software system termed ‘CARACAL’ that is able to extract and localise weak acoustic signals. The key to this is the use of four microphones on each logger, allowing for phase-based measurements and the ability to enhance signal-to-noise ratio through beamforming. As a proof of concept, we test the functionality of the CARACAL system by conducting a gunshot localisation experiment and demonstrate animal call detection and localisation from a lion predation event. Results show the system could locate gunshots with an average accuracy of 33.2 ± 15.3 m within an array of 7 stations 500 m apart. When applied to animal call positioning, we show long range (> 1 km) localisation of three different species’ calls, Cape buffalo, chacma baboon and spotted hyaena. With a cost of approximately £150 per unit, the CARACAL system provides a cost-effective solution for acoustic localisation over large areas. The system is open source and can be customised to suit a variety of wildlife research applications.
Vocal discrimination of African lions and its potential for collar-free tracking

Wijers, Matthew, Trethowan, Paul, Du Preez, Byron, Chamaillé-Jammes, Simon, Loveridge, Andrew J, Macdonald, David W, and Markham, Andrew

Bioacoustics 2020

Abstract HTML

Previous research has shown that African lions (Panthera leo) have the ability to discriminate between conspecific vocalisations, but little is known about how individual identity is conveyed in the spectral structure of roars. Using acoustic – accelerometer biologgers that allow vocalisations to be reliably associated with individual identity, we test for vocal individuality in the fundamental frequency (f0) of roars from 5 male lions, firstly by comparing simple f0 summary features and secondly by modelling the temporal pattern of the f0 contour. We then assess the application of this method for discriminating between individuals using passive acoustic monitoring. Results indicate that f0 summary features only allow for vocal discrimination with 70.7% accuracy. By comparison, vocal discrimination can be achieved with an accuracy of 91.5% based on individual differences in the temporal pattern of the f0 sequence. We further demonstrate that passively recorded lion roars can be localised and differentiated with similar accuracy. The existence of individually unique f0 contours in lion roars and their relatively lower attenuation indicates a likely mechanism enabling individual lions to identify conspecifics over long distances. These differences can be exploited by researchers to track individuals across the landscape and thereby supplement conventional lion monitoring approaches.
CVPR ORAL Randla-net: Efficient semantic segmentation of large-scale point clouds

Hu, Qingyong, Yang, Bo, Xie, Linhai, Rosa, Stefano, Guo, Yulan, Wang, Zhihua, Trigoni, Niki, and Markham, Andrew

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020

Abstract arXiv Github

We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.
SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform

He, Yuhang, Trigoni, Niki, and Markham, Andrew

In International Conference on Machine Learning 2021

Abstract arXiv PDF

We present a new end-to-end trainable and light-weight framework, SoundDet, for polyphonic moving sound event detection and localization. Prior methods typically approach this problem by preprocessing raw waveforms into time-frequency representations, which are more amenable to process with well-established image processing pipelines. In addition, detection is performed in a segment-wise manner, leading to incomplete and partial detections. SoundDet takes a novel approach and directly consumes the raw, multichannel waveform and treats the spatio-temporal sound event as a complete “sound-object" to be detected. Specifically, SoundDet consists of a backbone neural network and two parallel heads for temporal detection and spatial localization. Given the high sampling rates of raw waveforms, the backbone network first learns a set of phase-sensitive and frequency-selective filters to explicitly retain direction-of-arrival information, whilst being highly computationally and parametrically efficient in comparison to standard 1D/2D convolution. A dense sound event proposal map is then constructed to handle the challenges of predicting events with a wide distribution of temporal durations. Accompanying the dense proposal map are a temporal overlapness map and a motion smoothness map that measure a proposal’s confidence to represent an event from temporal detection accuracy and movement consistency perspective respectively. Using the two maps simultaneously allows SoundDet to be trained in a unified manner over space and time. Experimental results on the public DCASE dataset show the advantage of SoundDet on both segment-based and our newly proposed event-based evaluation metrics.