Risk is Stronger than Reward

6 min readJan 1, 2024

At its most basic level, we connect the dots between risk factors. What is so fascinating about that is it is based on how people learn. For example, a person grabs a log from the woodpile and sees a poisonous snake underneath. The woodpile is now seen as a risk factor for a snake bite.

Behind the scenes, our brain makes cause-and-effect relationships. These produce a network of factors that lead to a specific risk or peril. Each experience adds a new node to the risk graph. Consider the same person walking on a leafy trail. They soon hear the distinctive sound of a rattle and look down to see a rattlesnake. Now, the risk of snake bite has a cause-and-effect relationship with leafy trails and woodpiles.

Reinforcement Learning

“We learn by interacting with our environment,” explains the father of Reinforcement Learning, Richard Sutton. Humans pay attention to how their environment responds to their actions regardless of activity. This is the foundation of learning.

Consider a person learning to cast a fly rod. Perhaps they received instruction in a parking lot, read an article, or watched a YouTube video on fly casting. Once at the river, things change. Anticipation builds as the tiny artificial insect, complete with a razor-sharp hook, is tied onto the line.

Pulling back on the rod with one arm, the tiny fly moves behind the person at blinding speed. Then in a single motion, the rod is pulled forward with the line carrying the fly toward the target. Then, everything comes to a quick halt. A high branch has caught the line.

Reinforcement Learning (RL) is a machine learning technique that maps situations to actions. It is based on two features: trial-and-error search and delayed reward. The goal is to capture the most critical aspects of the current problem for a learning agent interacting over time with its environment.

The line goes tight as the person pulls on the line to free it. They pull harder in frustration. Still the line will not move. Shaking the rod violently does nothing. From here, the person has two choices.

RL technology has led to impressive feats, such as self-driving cars and autonomous stock trading. Unlike other machine learning techniques, it is based on trial and error. Sutton explains, “To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past.”

Perhaps our novice fly caster decides to cut the line and watches helplessly as the tiny fly shimmers in the sun high above them. Afterward, they tied on a new fly and made a second attempt further away from overhead branches. Thus a new connection between overhead branches and the peril of tangled lines is made.

Pain and Learning

Where RL falls short is that it requires prior experience. The learning agent must try multiple options to make better future decisions. This is similar to how humans learn but leaves out the crucial part of self-preservation. Pain is the best teaching for living things, which is hard to replicate for machines.

A second option for the fly caster with a hook caught high in a tree is to lay down the rod and pull solely on the line with two hands. While most fishing lines break with 5 to 20 pounds of pressure, a rotted tree branch snaps free with far less. Thus, pulling hard to retrieve a stuck hook from a tree has two significant risks.

One is that the line comes loose and returns to the person with great force, complete with a razor-sharp hook aimed directly at their face. The second is similar, where a rotted tree branch comes crashing down upon them. Either of these experiences will cause the fly caster to reach for the scissors the next time a line is caught high above them.

For humans, these types of experiences can be avoided through the advice of others. For example, while the fly caster looks up at the branch above, a more experienced fly fisher walks by and says, “do not pull on that line.” Thus, a conversation ensues, and the novice agrees that cutting the line is the best course of action.

Causal Risk

Estimand’s Causal Risk approach to AI is based on a graph of cause-and-effect relationships between risk factors that serve to model the world. We do this through an automated causal detection process. The graph changes to improve the model as the system is exposed to more data.

The noteworthy advantage of Causal Risk vs. RL is that it considers a core tenet of Chaos theory that declares that predictions are impossible unless the initial conditions are known. An exciting development with this approach is that risk factors often change just as they do in the real world.

For example, the risk of snake bite from a leafy trail decreases significantly during winter. However, this expands into areas of global risk where macroeconomics, geopolitical, social, climate, and technological risks ebb and flow based upon cycles. At times, certain factors are more likely to cause danger, while at others, they pose almost zero risk.

Connecting Dots

A model of the world as it stands at this moment is paramount to understanding reality. Currently, production AI systems leave out this concept. Turing Award winner Judea Pearl describes this approach as machines with impressive abilities but no intelligence. He explains, “The difference is profound and lies in the absence of a model of reality.”

We created Estimand from Pearl’s research to answer the question, “what if we can determine the causal relationship between datasets?” As experiments failed, we were reminded of the prevailing view on causality that one needs a subject matter expert to determine causal relationships. However, we have to automate causal detection to give machines a sense of the world in which they operate.

As money dwindled and the hope of raising new funds became less and less, we had to either deliver a product or close. We assembled for lunch to address this and discussed ways to achieve the impossible. However, the word impossible was not used. Instead, each team member had to accept the challenge as possible or leave the company.

Over the next few weeks, everyone dove into the current research. However, nothing existed in a book, blog post, or paper describing how to perform causal detection outside human involvement. Instead, we shifted to understanding how humans determine causal relationships.

With a basis in Reinforcement Learning (RL), we discovered that causal relationships are made through interactions. For example, a person finds a snake in the woodpile. It was then a matter of testing two datasets for causality. A series of matrix experiments makes it possible to prove or disprove the relationship. After that, we just compared multiple datasets and stored the results inside a graph.

Conclusion

The beauty of Estimand’s approach is the simplicity of it. The inputs are datasets, and the outputs are nodes and edges representing risk factors, their influence, and direction. It is human-readable and thus transparent and easily proven with simple tests. The result is a current snapshot of global risks that serves as a model of reality.

For now, banks and insurance companies use this model of financial risk. However, it will be the foundation for Artificial General Intelligence (AGI). You can learn more about Estimand by visiting our website: https://estimand.ai.

References

- Easley, D. & Kleinberg, J. (2010) Networks, Crowds, and Markets. Cambridge University Press

- Mandelbrot, B. & Hudson, R. (2004) The (Mis)Behavior of Markets. Basic Books

- Moses, T. (2023) The Network Effects of Risk. Estimand Insights. https://www.estimand.ai/insights/the-network-effects-of-risk/

- Pearl, J. (2009) Causality — Models, Reasoning, and Inference Second Edition. Cambridge University Press

- Sutton, R. & Barto, A. (2020) Reinforcement Learning: An Introduction. The MIT Press