
Confidential computing: the final frontier of data security
What is confidential computing? There are ways to encrypt your data at rest and while in transit, but confidential computing protects the confidentiality and integrity of your data while it is in use.
Data threats never rest, nor should the protection of your sensitive information. That's the driving principle behind confidential computing, which seeks to plug a potentially crippling hole in data security.
Confidential computing provides a secure platform for multiple parties to combine, analyze and learn from sensitive data without exposing their data or machine learning algorithms to the other party. This technique goes by several names — multiparty computing, federated learning and privacy-preserving analytics, among them — and confidential computing can enable this type of collaboration while preserving privacy and regulatory compliance.
Data exists in three states: in transit when it is moving through the network; at rest when stored; and in use as it's being processed. Data is often encrypted while at rest and in transit, but not when it's being processed.
Securing data at rest was largely solved by Whitfield Diffie in the 1970s, while encryption of data in transit was largely solved through TLS standards that were applied across our industry in the 2010s. The last remaining challenge is ensuring data is protected while it is in use.
Confidential computing closes this security gap during this third state by securing a portion of the processor and memory to provide a protected and isolated container for the data, called a trusted execution environment, or TEE, also known as a secure enclave. Software that runs atop the TEE prevents the data from being stolen or modified but also ensures code integrity.
This added layer of defense reduces the attack surface of the system, protecting applications and data from breaches, malicious actors and insider threats, while allowing the ability to transport sensitive workloads among on-premises data centers, public cloud and the edge.
In Azure confidential computing virtual machines, a part of the CPU's hardware is reserved for a portion of code and data in your application. This restricted portion is the enclave.Image: Microsoft Azure
A growing challenge
In a world where information is constantly generated, shared, consumed and stored — including credit card data, medical records, firewall configurations and geolocation data — protecting sensitive data in all of its states is more critical than ever. As threats against network and storage devices are increasingly thwarted by tough protections while data is in transit and at rest, attackers have shifted their pernicious energies to data-in-use. The industry witnessed several high-profile attacks involving malware injection coming in this third state, including the Triton attack and the Ukraine power grid attack.
That problem is further challenged as more and more data is stored and processed on mobile, edge and IoT devices and then moved to the cloud. That enlarged attack surface is bound to test security for organizations that handle financial and health information and are legally bound to mitigate threats that target the confidentiality and integrity of data in their systems.
While this added level of security that confidential computing provides is critical — Gartner lists "privacy-enhancing computation" as one of its Top Tech Trends for 2021 — it's still not common. Incomplete security could mean missed business opportunities, particularly when it comes to that businesses may not be inclined to share proprietary data with other organizations. Confidential computing allows them to benefit from the inferencing that AI provides in large datasets without actually sharing the datasets with each other. The data can be computed by AI within the TEE, which prevents the different organizations from seeing each other's data. They get the benefits of AI without sharing the raw data.
"Many companies are keeping vast amounts of sensitive data out of the public cloud due to regulation or to keep full control and are thus missing out on the benefits the cloud brings to AI and big data analytics," said Mark Russinovich, chief technology officer of Microsoft Azure. Azure is the first public cloud to offer virtualization infrastructure for confidential computing that uses hardware-based TEEs.
Confidential computing is a key component in expanding our use of AI, Russinovich added, because it ensures that the data used to train and create accurate AI models is protected from alteration and that the machine learning computations performed are correct and can be trusted. The result is that "AI insights on the combined datasets can then more confidently be shared," Russinovich said.
Accurate AI models can bring huge benefits to many sectors, including better diagnostics and treatments in the health care space and more-precise fraud detection for the banking industry.
Some TEEs are even designed to be able to protect the data from attacks leveraging physical access to the servers. That's important because, in the context of confidential computing, unauthorized entities could include other applications on the host, the host operating system and hypervisor, system administrators, service providers and the infrastructure owner — or anyone else with physical access to the hardware.
"By protecting data in use, confidential computing enables companies to first bring sensitive data to the cloud and second combine data without giving each other access to that data," Russinovich said.
Multiple paths to confidentiality
One of the appeals of confidential computing is that it offers multiple layers of protection, including a guarantee that the code is running in an enclave you trust and that the code has not been altered.
A process known as attestation allows changes to be detected by having the hardware generate a certificate stating what software is running. Unauthorized changes can quickly be detected. "It creates a finer grain capability that allows you to only trust the hardware and the code that needs to be protected," said Ron Perez, Intel security fellow.
Developers can also choose various paths to TEE-enhanced security depending upon whether they want it employed more quickly or want to custom-design their protection. Some may opt for the fast track by creating confidential containers using an existing unmodified docker container application written in a higher programming language, like Python or Java, and a partner like Scone, Fortanix, or Anjuna as well as open-source software like Graphene or Occlum to "lift and shift" an existing application into a container backed by confidential computing infrastructure. Other developers choose the path that puts them in full control of the code in the enclave (private regions of protected memory) by developing enclave-aware containers with the Open Enclave SDK, Intel SGX SDK or a framework such as the Confidential Consortium Framework.
With new demands being placed on data security constantly, confidential computing is sure to be a critical part of a cybersecurity plan. "We're seeing a paradigm shift with confidential computing," Perez said.