# Artificial intelligence can not only see plane world, but also observe four-dimensional space-time

Production of NetEase technology section know column (official account: tech_163)

A physics concept is helping computer learning to observe in a higher dimension than two dimensions. The new AI technology can not only find patterns in two-dimensional images, but also in spheres and other surfaces, so that AI can jump out of the plane world.

The new deep learning technology has shown a good prospect in recognizing lung tumor more accurately from CT scan, which may bring better medical diagnosis one day.

Here is the translation

Computers can now drive, beat world champions in chess and go games, and even write essays. The revolution of artificial intelligence comes from a special kind of artificial neural network to a great extent. Its design inspiration comes from the interconnected neuron layer in the visual cortex of mammals. Surprisingly, these convolutional neural networks have been proved to be proficient in learning patterns in two-dimensional data, especially in computer vision tasks such as recognizing handwritten words and objects in digital images.

However, when applied to data sets without built-in geometric data, such as irregular shape models used in 3D computer animation, or point clouds generated by self driving cars and used to draw surrounding environment, this powerful machine learning architecture will not work. In 2016, a new subject named geometry deep learning came into being, whose goal is to jump the convolution neural network out of the two-dimensional world.

Now, researchers have proposed a new theoretical framework for building neural networks that can learn patterns on any geometric surface. These networks, called canonical equivariant neural networks, were developed by taco Cohen, Maurice Weiler, berkay kicanaoglu and Max welling at the University of Amsterdam and Qualcomms artificial intelligence research institute. They can not only find the patterns of two-dimensional pixel arrays, but also the patterns of spheres and asymmetric surface objects. This framework is a clear answer to the problem of surface depth learning. Said wellin.

In terms of learning models from simulated global climate data, convolutional neural networks have greatly surpassed their predecessors. These algorithms may also prove useful in improving the vision of UAVs and autonomous vehicles observing 3D objects, and in discovering patterns from data collected from irregular surfaces of the heart, brain or other organs.

Taka Cohen, a machine learning researcher at Qualcomm and the University of Amsterdam, is one of the leading designers of canonical equivariant convolutional neural networks.

The solution proposed by the researchers to take deep learning out of the plane world is also deeply related to physics. Physical theories that describe the world, such as Einsteins general relativity and the standard model of particle physics, exhibit a property called gauge equivariant. This means that the number of objects in the world and the relationship between them are independent of any reference system (or gauge); whether the observer is moving or stationary, no matter how far apart the numbers on the scale are, they are consistent. Measurements made on different measuring instruments must be able to convert each other to maintain the basic relationship between objects.

For example, measure the length of a football field in yards and then in meters. The measured figures will change, but the change is predictable. Similarly, two photographers take pictures of an object from two different vantage points, but those images can be related to each other. Changes such as norms ensure that physicists realistic models are consistent, regardless of the perspective or unit of measurement they choose. Canonical equivariant convolution neural network makes the same assumption for data.

They wanted to apply this idea of physics to neural networks, said Kyle Cranmer, a physicist at New York University. They finally came up with a way.

Jump out of the two-dimensional world

Michael Bronstein, a computer scientist at Imperial College London, coined the term geometry deep learning in 2015 to describe the initial effort to jump out of the two-dimensional world and design neural networks that can learn patterns from nonplanar data. The term - and related research - soon became popular.

Brownstein and his collaborators know that to transcend the Euclidean plane, they need to reconstruct one of the basic computing processes that makes neural networks so effective in two-dimensional image recognition. This process is called convolution. It allows one layer of neural network to perform mathematical operations on small pieces of input data, and then transfer the results to the next layer of neural network.

Roughly speaking, you can think of convolution as a sliding window. Brownstein explained. Convolutional neural networks slide many such windows on the data like filters, and each window is designed to find a certain pattern in the data. For cat images, trained convolutional neural networks use filters to detect low-level features in the original input pixels, such as edges. These feature information is passed to other layers in the network, and then these layers perform additional convolution and extract higher-level features, such as eyes, tails or triangular ears. The convolutional neural network trained to recognize cats will eventually use the results of these layered convolutions to assign labels to the whole image, such as cat or not cat.

But this method only applies to plane. When the surface you want to analyze becomes curved, youre basically in trouble. Weilin said.

The difficulty of convolution on a surface (known as a manifold in geometry) is the same as holding a small piece of translucent chart paper on a globe, trying to accurately delineate the coastline of Greenland. When you press the chart paper on Greenland, it will definitely wrinkle, which means that when you flatten it again, your painting will deform. But making chart paper tangent to a point on earth, staring at it and tracking the edges of Greenland, a technique known as Mercator projection, can also deform. If the manifold is not a globe as neat as a globe, but a more complex or irregular thing, such as the three-dimensional shape of a bottle, or a convoluted protein, it is more difficult to convolute on it.

In 2015, brost and his colleagues found a solution for convolution on a non Euclidean plane: reimagining the sliding window as something more like a round spider web than a chart paper, so that you can press it on the globe (or any other surface) without wrinkles, stretching or tears.

By changing the attributes of the sliding filter in this way, the convolutional neural network is better at understanding some geometric relations. For example, the network can automatically recognize a 3D figure bent into two different positions - such as a standing person and a person lifting a leg - as an example of the same object, rather than two completely different objects. This change also makes this kind of neural network greatly improve the efficiency in learning. The standard convolutional neural network uses millions of shape samples and requires weeks of training. We used about 100 different postures and trained for about half an hour, Brownstein said

At the same time, taco Cohen and his colleagues in Amsterdam began to solve the same problem in the opposite direction. In 2015, Cohen, a graduate student at that time, was not studying how to jump deep learning out of the plane world. Instead, he was interested in what he thought was a practical engineering problem: data efficiency, or how to train neural networks with fewer samples, rather than using thousands or even millions of samples as usual. For example, deep learning methods are very slow learners, Cohen said If youre training convolutional neural networks to recognize cats, theres no problem (given that there are countless pictures of cats on the Internet). However, if you want the network to find more important things, such as cancer nodules in lung tissue images, its not so easy to find enough training data - these data need accurate and appropriate medical markers, and there is no privacy problem. The fewer samples needed to train the network, the better.

Cohen knows that one way to improve the efficiency of neural network data is to make some assumptions about the data in advance - for example, the lung tumor is still a lung tumor, whether it is rotating in the image or reflected out. In general, convolutional networks must be trained using many examples of different orientations of the same pattern, learning this information from scratch. In 2016, Cohen and wellin co authored a paper defining how to encode some of these hypotheses into neural networks as geometric symmetry. This method is very effective. In 2018, Cohen and Marysia Winkels further promoted this method, proving that it has a gratifying effect in identifying lung cancer in CT scans: their neural network can identify the visual evidence of this disease only using one tenth of the data used to train other networks.

On this basis, researchers in Amsterdam continue to summarize. This is how they realize the process of specification and other changes.

Equivariant thinking extension

There is a basic similarity between physics and machine learning. As Cohen said, both areas are related to observation and modeling to predict future observations. He points out that it is crucial that neither of these areas is looking for models of individual objects - it is not good to give different descriptions of hydrogen atoms and reversed hydrogen atoms - but for general categories of objects. Of course, physics has been quite successful in this respect.

Equal variation (physicists like to use covariance) is a hypothesis that physicists have used since Einstein to generalize their models. This means that your description of some physical phenomena should have nothing to do with what kind of scale you use or what type of observer you are. Said Miranda Cheng, a theoretical physicist at the University of Amsterdam. Or as Einstein himself said in 1916: the general laws of nature are expressed by equations applicable to all coordinate systems.

Miranda Zheng, theoretical physicist, University of Amsterdam

By using a simple example of this principle - translation equivariant - convolution network has become one of the most successful deep learning methods. The window filter that finds a feature (such as a vertical edge) in the image will slide (or pan) on the pixel plane and encode the positions of all these vertical edges; then, it will create a feature map that marks these positions and pass it to the next layer in the network. Due to translation and other changes, it is possible to create a feature map: neural network hypothesis, the same feature can appear anywhere in the two-dimensional plane, and can recognize the vertical edge as the vertical edge, whether it is in the upper right corner or the lower left corner.

The key of the equivariant neural network is to put these obvious symmetries into the network structure. Weiler said.

In 2018, weller, Cohen and their doctoral tutor Max wellin have expanded this concept to other equivariant categories. Their group equivariant convolution neural network can find the rotation or reflection features in the plane image without training the specific examples of those directional features; the spherical convolution neural network can create the feature map according to the data of the sphere surface, and will not twist it into the plane projection.

These methods are still not universal enough to deal with data on manifolds with uneven and irregular structures - structures that describe the geometry of almost all objects, from potatoes to proteins, to human bodies, to spatiotemporal curvature. These types of manifolds do not have global symmetry, so neural networks can not assume that they are equivariant: each position on them is different.

The challenge is that sliding a flat filter on a flat surface can change the direction of the filter, depending on the particular path it chooses. Imagine a filter designed to find simple patterns: a black spot on the left and a light spot on the right. Slide it up, down, left, or right on a flat surface, and it will always keep the right side up. But on the sphere, that changes. If you move the filter 180 degrees around the equator of the sphere, the direction of the filter will remain the same: black dots on the left and light dots on the right. However, if you let it pass first through the north pole of the sphere and slide it to the same position, the filter is reversed - black dots on the right and light dots on the left. The filter will not find the same pattern in the data, nor will it encode the same feature map. Moving a filter over a more complex manifold can point in any direction that is inconsistent.

Fortunately, physicists have solved the same problem and found a solution: change of specifications, etc.

The key, explains wellin, is to forget to track the direction of the filter as it moves along different paths. Instead, you can select only one filter direction and define a consistent way to convert all other directions to it.

The problem is that while any metric can be used for the initial orientation, the basic pattern must be preserved when converting other metrics to that reference frame - just as the basic physical quantities must be preserved when converting the speed of light units from meters per second to miles per hour. Through this method of variation, such as specification, wellin said, the actual numbers will change, but their changes are completely predictable.

Cohen, weller and wellin encode the canonical equivariant into their convolutional neural network in 2019. They realize this by setting mathematical constraints on what the neural network sees in the data through convolution, and only the canonical and equivariant modes pass through the neural network layer. Basically you can give it any surface - from Euclidean planes to curved objects, including strange manifolds like Klein bottles or four-dimensional space-time - a good deep learning ability on any surface. Said wellin.

working principle

The theory of canonical equivariant convolution neural network is so generalized that it automatically integrates the internal assumptions of previous geometric depth learning methods, such as rotation on the sphere. Even Bronsteins early methods, which let neural networks recognize a single 3D shape bent into a different pose, apply to it. Normative variation is a very broad framework. It includes the special settings we made in 2015. Bronstein said.

In theory, the canonical equivariant convolution neural network can be applied to any surface, but Cohen and his collaborators have tested it on global climate data. These data must have a basic three-dimensional spherical structure. They constructed a convolutional neural network using their canonical equivariant framework, which was trained to find extreme weather patterns, such as tropical cyclones, from climate simulation data. In 2017, the government and academic researchers used the standard convolution network to find tropical cyclones from the data, with an accuracy of 74%; last year, the standard equivariant convolution neural network found such cyclones with an accuracy of 97.9%. (its also more accurate than a less general geometry depth learning method designed for spheres in 2018 - the systems accuracy is 94%. )

Mayur mudigonda, a climate scientist who uses deep learning technology at Lawrence Berkeley National Laboratory, said he would continue to focus on canonical etc. This aspect of human visual intelligence, which can be accurately identified no matter what orientation the pattern is, is what we want to bring to the climate community. Qualcomm recently hired Cohen and wellin to acquire a start-up that aims to integrate their early research around equivariant neural networks. Qualcomm is now planning to apply the theory of canonical equivariant convolutional neural network to the development of more advanced computer vision applications, such as enabling UAVs to conduct 360 degree panoramic observation in real time. (this fisheye vision naturally maps to a sphere, just like global climate data. )

At the same time, the canonical equivariant convolution neural network is becoming more and more popular among physicists such as Kramer. They plan to use it in the research of simulation data of subatomic particle interaction. We are analyzing data related to strong nuclear forces in an attempt to understand whats going on inside the proton, Kramer said. The data is four-dimensional, so we have a perfect use case for the neural network with this kind of specification and other changes.

Risi Kondor, a former physicist who now studies canonical equivariant neural networks, said that the potential scientific applications of canonical equivariant convolutional neural networks may be more important than their applications in artificial intelligence.

If youre identifying cats on youtube and you find that youre not very good at identifying cats upside down, its not very good, but maybe you can accept it, he said But for physicists, it is crucial to ensure that neural networks do not misinterpret force fields or particle trajectories in specific directions. Its not just a matter of convenience, Condor said. Its important to respect basic symmetry.

However, although physicists mathematics helps to inspire canonical equivariant convolution neural networks, and physicists may find a lot of uses for them, Cohen points out that these neural networks themselves will not find any new physical phenomena. We can now design networks that can handle very strange data types, but you have to know the structure of the data first, he said. In other words, physicists are able to use the canonical equivariant convolution neural network because Einstein has proved that space-time can be represented as a four-dimensional curved manifold. Cohens neural network cant see the structure itself. Learning symmetry is one thing we wont do, he said, although he hopes to do it in the future.

Cohen cant help but be happy with the interdisciplinary connection he once felt intuitively, and now he has proved it with mathematical rigor. I always have the feeling that machine learning and physics are doing very similar things. Its a great thing that I found out: we just started with this engineering problem, and as we started to improve our system, we gradually found more and more connections between them, he said

Netease technology know no column, curious about the world, with you to explore the unknown.

Pay attention to the wechat of Netease Technology (ID: tech_) and send know no to view all the know no manuscripts.

Source: editor in charge: Wang Fengzhi, nt2541