AI theory and practice – Howard Science AI

Signal Processing: this work introduced a novel signal processing method called Stochastic Bernstein which can be described as a deconvolution step followed by a convolution step. It admits a mollifier which can be any positive function controlling the function recovery. It also contains a parameter akin to a diffusion which also controls the shape of the function recovery. Dependent on how these two stages are applied one can get approximation (the function does not pass through the data points) and interpolation (the function passes through the data points) as well as other behaviours known as quasi interpolation and peak sharpening. Depending on parameters chosen the data recovery is spectral or not (according to the Runge function test). We developed “limiter functions” (unpublished but in internal MOD reports) that can handle discontinuities. The method is matrix stochastic. Moreover, gradients are immediately available (depending on the properties of the mollifier). We developed an alternative image compression to JPEG of impact in steganography and steganalysis. We used it to filter out noise, as alternative to more awkward polynomial noise filtering methods, such as in MALDI TOF (low mass weight) applicable to any uncharacterised noise in equipment. We tried to use it to evolve analytical solutions of differential equations using Genetic Programming (paper by D Howard and S Roberts also included here that shows how GP can be used for recovering the analytical solution of convection diffusion equations as alternative to weighted residual methods (FEM, FD etc). There are also more recent papers on other signal processing problems. Links to papers: paper, paper, paper, paper, paper, paper (all require a password owing to copyright restrictions). What are further unexplored uses of this method? the method is an alternative to kriging and radial functions in the design of maps or their interpolation. In zooming it could zoom images selectively respecting hard edges in letters but soft edges when zooming the face of a bride for example and do this adaptively in the same image (owing to the choice of mollifier and diffusion parameter). Having the gradient available makes the method suitable for camera image formation. Finally, the limiters we develop make it suitable to respect discontinuities (shock fitting), and through the correct mollifier and knowledge of the nature of noise it can be used to intelligently denoise and particularly for super-resolution in security and defence work. We co-published two more recent papers on other advanced signal processing topics in prestigious journals: paper, paper.

AI in mammography screening: decades before deep learning efforts by Google Deep Mind, our director working at QinetiQ secured a collaboration with Dr László Tabár, M.D., F.A.C.R. (Hon), the foremost world expert in breast cancer screening (between 2000-2002) and worked with him in central Sweden (Falun). From László Tabár, our director and his team discovered the real medical problem of interest and applied Artificial Intelligence to try to solve it. The following lessons guided this project: most calcifications are benign markers, for example, tea cup calcifications from sclerosis adenosis, the crush stone malignant calcifications are few in number and radiologists can detect them easily. In contrast, architectural distortions are nearly always malignant, here the tissue is “pulled” by the cancer. However, these distortions can be missed by radiologists and here AI can help. Visual illusions are real with massive distortions not seen when expecting a small lesion and vice-versa. Moreover, an important observation is that the pattern of a mammogram is even more formidable than the presentation of the lesion! All this motivates the construction of a taxonomy of mammograms. László Tabár and Wolfe before him had manually and visually grouped mammograms into such “classes”. In the László Tabár system the classes were five: (1) young women where the parenchyma is plentiful; (2) menopausal older women where the parenchyma is mostly replaced by adipose tissue (ideal for mammography), (3) same as the second type but with residual parenchyma in retro areolar regions (behind the nipple); (4) extensive fibrosis; and (5) extensive adenosis. These last two classes affect only five percent of women but mammography is useless in such cases. The breast of women changes with age and thus women migrate longitudinally in time in this taxonomy. We came to the conclusion that the real medical problem is not CAD (computer aided detection of cancer by AI) but instead the construction and use of this taxonomy. The team led by our director made use of Simon C Roberts’ PhD work in music perception, applying his ART-2 Artificial Neural Network (Carpenter and Grossberg), an unsupervised learning method of AI capable of non-linear clustering, on the CC views of mammograms from the Swedish archive. The resulting 60 classes in the taxonomy gathered from a few thousands of scanned images can be seen here and here. The first publication of this work was by Howard, Roberts and Tabár in the the 6th International Workshop on Digital Mammography (IWMD) in Bremen, Germany. Much later we published more details of this work in a paper and included some others as co-authors for unexplainable or mistaken reasons (funding support, loyalty, etc. but who did not contribute to this work). In the closed literature of QinetiQ PLC Malvern, at that time, there is a long and comprehensive report describing this work. What is its significance of this work even today?: The technology we developed and deployed could classify all women’s mammograms from vast archives into the taxonomy and see how they move from class to class over time as their breast involutes: (a) breast cancer drug trials could select representative women from the taxonomy classes; (b) negligence trials could use the taxonomy to determine culpability (is this breast type more common than others); (c) breast cancer probability could be correlated to breast cancer type; (d) even CAD could be tested on representative data from the taxonomy; (e) correlations between genomics, child birth, smoking and other factors to the taxonomy could reveal aspects of breast cancer; (f) mammograms could be sorted for second reading by taxonomic class to make the job of the screener easier; (g) even CAD could be specialized to detect the lesion inside the class. The taxonomy preserves the information over time, population information, and is much more powerful than CAD!!! Contrast this to those developing and now testing CAD 20 years after our R&D was done.

Modularization in Genetic Programming (publications with John R. Koza): Genetic Programming (GP) is an alternative technique of artificial intelligence which has both benefits and drawbacks. When John R. Koza created it and popularized it his intention was to use all of the constructs which had helped computer programmers, such as: variables, memory, iteration, algebraic functions and conditional functions instead of weights and summations or Bayesian probabilities which typified most other powerful AI methods. Around 1992-1994 John introduced the concept of a subroutine into the primordial soup in GP. He named this construct as the “Automatically Defined Function” (ADF). Standard GP does have strong reuse because its “crossover” operator exchanges branches in the GP LISP tree representation. If a branch is useful it will appear in many places in the many individuals that make up the GP population. However, ADFs, especially when parameterized are more explicit abd powerful because they can be compact and get consulted. Modification of ADFs after selection by means of crossover or mutation as compared to or relative to genetic modification of the solution tree that calls the ADFs has never been properly studied to date. There is, however, another explicit modularization called “subtree encapsulation”. Angeline and others had tried it by freezing a subtree during a GP run, atomizing it, but it had not led to any advantage (or perhaps minimal advantage). The work by Roberts, Howard and Koza paper paper paper, combined subtree encapsulation with the concept of multirun. It atomized the subtrees after a short run, atomized the subtrees and started a new population and a new run. This worked for the first time allowing GP to solve parity problems when standard GP could not. We called it Multirun Subtree Encapsulation (MSTE) What untapped advantages exist for this method? Both ADFs and MSTE can solve the parity problems when standard GP cannot. On other problems, however, they may offer little to no advantage over standard GP in the sense of ability to discover the solution. However, MSTE has a proven advantage that it is an overall faster method to Standard GP. Imagine running 5 generations only, encapsulating, re-establishing a population and running again for 5 generations. Keep doing this for n times. All of the GP trees are small as compared to standard GP running for 5n generations. Compute time is up to 1000x less for many practical problems because the population does not have time to ‘bloat’. This has another advantage to do with explainability. As we select and atomize subtrees after a run stage they become a vector of values BUT we select the shortest representative in leaf count. Hence, the final solution is organized hierarchically from smaller and simple subcomponents. This advantage has been postulated in this paper but not yet sufficiently investigated.

Early application of GP to UK motorway traffic incident prediction and journey time prediction: UK Motorways imitated Los Angeles highways by installing a huge number of electromagnetic loops in the tarmac that count passing cars and measure their speed. These readings are minute averaged. The system is widespread on all motorways and is called MIDAS (Motorway Incident Detection and Automatic Signalling), a UK distributed network of traffic sensors, mainly inductive loops. The principal motivation of these sensors in California and in the UK is to measure the “occupancy” or how long cars stay over a sensor. This alerts the control office that an incident is developing whereby traffic has slowed, causing operators to activate lower speed limits in signs upstream of the location. The occupancy reading directly participates in the “California Algorithm” which alerts the control office to start reducing speed limits. However, the MIDAS readings are not only of occupancy but of velocity headway and vehicle counts by types. The UK Highways Agency asked us to use historical MIDAS data for two projects: journey time prediction and incident detection late at night. Our results are described in these papers paper, paper, but also in reports to the HA as this was paid consultancy. Interesting facts: (a) the time series for incident detection is naïve meaning that the best prediction of a journey time is the current prediction!!! it is hard to beat that. We used GP on masses of historical data to defeat that; (b) Dr Lenny Smith and our director discussed the problem at High Table in Pembroke and he suggested that for the prediction we could look back in time to a situation that was almost identical and see what happened next? We did that but predicting this value was wrong! meaning that there are many other external drivers not accounted for. (c) interestingly the journey time predictor evolved could be broken down into executable branches and when put on a table the HA engineers could see that for very low flow and low speed, the formulas that worked involved only the occupancy reading, which they had suspected, but never seen because GARCH as well as other AI was black box, our delivery was an explainable formula picture. Another year contract saw us perfecting software adding also powerful 3D graphics (a microscopic simulation of traffic, that had been used to decide for the concept of fast lane, and for other ideas in practice today such as hard shoulder running and active traffic management).

Genome discovery: It sound fun to use evolutionary computation, i.e., algorithms inspired in Darwinism and genetic operations, to find genes in a sequence and the pieces of nucleotide that control the gene expression: promoters, enhancers but that is exactly what we did for eukaryote human genome. We developed an algorithm in the form of a finite state machine whereby transitions and states directed nucleotide jumps. This delivered an algorithm that identified genes. paper

MOD competition of ideas and NATO work: Howard Science carried out two projects for DSTL one for the then counter terrorism theory centre. This modelled the drug trade modelling it on the coffee trade paper. Our firm introduced an optimization of a type of business plan with actors and messages which led to predictions of social networks. It was carried out with two partners: a professor of Human Factors and one in Engineering. Another paper applied this work to PERT project control. The second project modelled soldiers on patrol in Afghanistan and looked at the effect of matching different team members to different roles in the patrol and what effect this would have. Our director also published a position paper on terrorism with a Romanian think tank: paper. More recently, we collaborated with a Serbian group that looked at armour in tanks, a non AI focused review paper.

Early machine vision in defence by means of Genetic Programming and Neural Networks:

AI for analysis of ECG in drug toxicity studies and FDA approval processes. It turns out that the evaluation of a candidate drug, a very expensive process, often requires many subjects undergo ECG tests and a company subcontracted to do this, analyzes all of the ECG in India for economic reasons. We took ECG data and devised an early AI method to discover anomalies to provide a benchmarking check for the work in India. This was a contract but we only published some of the early work in a conference paper.