Understanding Fast Homomorphic Encryption from Scratch Part 1
This is a bit of a different blog than I typically do as the primary focus is not AI but cryptography. As such this blog will be pretty math-heavy and more of a work in progress. Also, I am less experienced in this field than my other blogs so this blog might be good for beginners like me! And for experts, feel free to correct any mistakes!
This blog was inspired by a company called Zama ai and in particular this blog. The papers we reference and the initial introduction I had to holomorphic encryption came from this blog.
Introduction
The main problem we currently have with ChatGPT or when we do pretty much anything online is that our data is just out there out for the grabbing. This is especially a problem when some people are giving ChatGPT their private information, for example, bank information, when having ChatGPT help with filling out Bureaucratic information or having businesses ask for confidential information.
The solution is to encrypt our prompt with a key and then have our model work on that prompt and then decrypt the output after the model finished working on it. In traditional encryptions, this is not possible because if you do the addition of two encrypted numbers you can’t say the result will mean anything when decrypted. However, in a special class of encryptions called Homomorphic encryptions, this is possible.
The basic idea behind holomorphic encryption is to make it so that you can do addition and multiplication on encrypted data and have the output when decrypted, become the result of that computation!
I wanted to build a solid background to approach the paper on TFHE so we will start with reading “Homomorphic Encryption Standard”
Homomorphic Encryption Standard
This paper was written in 2018 and the goal of this paper is to standardize security levels of various holomorphic encryption schemes and give concrete numbers on what parameters are good for deployment/how long it will take to attack. This will not go so much into the math of each technique but more into what operations can be done.
Notation
- λ is the security level in bits!
- PT is the plaintext type. Currently, it’s either
- MI(Modular Integers) which means all integers mod p for some number p. For example, p=5 is {0, 1, 2, 3, 4}. This maps to a problem called linear learning with errors(LWE).
- EX(Extension ring/fields) This is a bit similar to the above but with polynomials! So we take all polynomials with degrees lower than some degree and also have all the coefficients be modular integers. So if we choose x⁵ and 5, then, we get polynomials like 1+4x+2x²+3x³+x⁴. This maps to a problem called ring learning with errors(RLWE).
- K is the dimension of the vectors to be encrypted. This means that the number of vectors we want to encrypt for our project! The reason this is important is because the more we encrypt with the same key, the more vulnerable we will be. So knowing the number of times we will use this key becomes very important
- B is how complex the program/math we will use. One thing I remember from the blog is the more multiplication we do the more errors are accumulated so essentially, you will need to increase the precision and also the level of security.
Given the parameters we generated above that suit our security, we generate
- SK our secret key which we will keep private
- PK is our public key which will be used by other people to send us messages
- EK is called the evaluation key and is needed for doing the holomorphic computations. The reason this is separate from the public key is that in future operations, just the evaluation key is used. The evaluation key is public.
For this I do not quite understand the purpose as above, when we generate a public key, we get both the secret key and evaluation key already. My guess is the main purpose of this is for testing. One other interesting point here is that the secret key is enough to encrypt and decrypt which you can’t do in RSA encryption!
- M is the message
- C is the encrypted message
this algorithm encrypts the message into cipher text space.
Same as above but uses a secret key
For decryption, you can only do it with the secret key which does make sense
Here, we add 2 cipher texts C1 and C2. The main interesting part is we need an evaluation key and the parameters for this. We do not, however, need the public key.
Now, since this is holomorphic encryption, if M1 was the message for C1 and M2 for C2, M1+M2 will be the message for C3
Here, the main interesting part is M2 is not encrypted and we don’t need to use the public key to encrypt to add! I’m not sure how this works but if true, we can add plain text/messages to cipher texts!
This is the same as an addition but for multiplication. This is the main computational bottleneck for holomorphic encryption and if it can be made more efficient, will be very valuable.
Same as EvalAddConst but for multiplication
Now this is the main interesting part of holomorphic encryption. When doing a lot of computation, we accumulate errors in the cipher text. To correct this we do something called key switching and I always thought we update the evaluation key as well but in fact, we just generate a new cipher text C2 which is also an encoding of M1 which removes all the errors!
While this seems amazing, it takes the most amount of time in computation.
Given params, the evaluation key, an array of cipher texts, and the computation we are about to do, output a boolean flag that says whether the output cipher produced is correct which I think means whether the errors we accumulate are low enough for the cipher text output to decrypt properly
How to say we are secure?
The paper defines security as given 2 cipher texts of 2 random messages, opponents will have no advantage in guessing which is which.
The adversary will have the public key, the evaluation key, and the params. One interesting part here is that it seems as though in some holomorphic encryption schemes there is no public key. But overall, the cipher text does not provide any information about the original message.
Compactness
The above operations do not expand the length of the cipher texts.
Efficient decryption
Decryption does not depend on what operations were done on the cipher texts
BGV
For integer modulus, p, in PT the only restriction seems like it must be greater than 1 which is interesting as typically it has to be prime too.
For B, it will be the maximum number of multiplication that will be done. This is called the multiplicative depth.
The polynomials generated in EX is called a ring and can be written as
R=Z[x]/f(x)
where Z[x] is all possible polynomials with powers of x and f(x) is the biggest possible polynomial.
The plain text ring is defined as R/pR from the integer modulus which sets the upper bound of coefficients. and the cipher text ring is R/qR for some integer q. The main interesting part is we have a key distribution D1 from which the keys are chosen and an error distribution D2 from which the error is chosen. This distribution is the distribution of polynomials so I think we have different sets of coefficients between normal keys and the error keys.
SK is chosen from the ring R. While there doesn’t seem to be a restriction explicitly mentioned in paper, I think the secret key is from R/qR as “Fully Homomorphic Encryption without Bootstrapping” follows that formulation
this op first runs the SecKeygen to get SK and EK. We will denote SK as s.
Then, we take a random element from the ring R/qR which we will call a! Then the public key is (-a, as+pe). e is taken from the error distribution. Here, note that the public key is a pair. For now, this might not make sense but it will when we try out encryption!
Firstly, the message M, which is a list of integers, gets converted to M’ which is a ring R/pR. For how this is done, the best resource I found was here. But it seems like there is a matrix that can map any polynomial in our ring into a vector, so we just take an inverse of this and multiply by our vectors! Then we have a one-to-one connection between our plaintext and vectors.
One important thing to note here is that this process of converting plaintext into polynomials is very similar to doing Fourier transforms on the inputs. So essentially the polynomials are like the frequency domain while the input vectors are the time domain if you are familiar with that! Overall, I just wanted to say that for operations like say multiplications, the meaning becomes slightly different between the input vectors and the encrypted vectors.
Now, that was the first part. The second part is simply to add the encoded M’ to our public key to “encrypt” it
(-a, as+pe+M’)
as so. One note here is that the above makes it trivial to find the plaintext M’ as you can just subtract the public key from this encryption to recover M’. This is the reason this is a secret encryption and not a public one!
Now to avoid the above issue, for the public encrypt scheme, we do
(-au+pe1, (as+pe)u+pe2+M’)
so essentially, we get new parameters u, e1, and e2 and do some operations with them to make M’ less obvious which is an interesting idea. I’m sure the u, e1, and e2 get destroyed after each use so the decryption must have a way to recover M’ without them.
The secret key SK or s is an element of ring R but the cipher text, (c0, c1), is made so that they are elements of ring R/qR.
Interestingly, in this scheme, the cipher text can grow in size from 2 to even 3 or 4.
Now, how do we decrypt our messages? It’s a simple operation of c0s+c1!
Let’s test. For
(-a, as+pe+M’)
as+as+pe+M’=pe+M’
and since the plain text is taken mod p, the pe disappears! And this is just M’
Now, for the public encrypt,
(-au+pe1, (as+pe)u+pe2+M’)
aus+pe1s+aus+peu+pe2+M’=pe1s+peu+pe2+M’
Now, while this is larger, all the error terms go to 0 as we take mod p so M’ is decrypted!
So the main idea here is that it seems as though there is absolutely no error in encryption/decryption which is not always the case in holomorphic encryption.
In R/qR, we just add the 2 cipher texts element-wise so
Let’s try it out to see if it decrypts properly in the secret encrypt side.
C1=(-a, as+pe1+M1’)
C2=(-a, as+pe2+M2’)
so when we add,
C3=(-2a, 2as+pe1+pe2+M1’+M2’)
which when we decrypt, is
p(e1+e2)+M1’+M2’
which mod p is M1'+M2'!
Now this is where things get a bit more complicated and also where holomorphic encryption will vastly benefit from if improved. So initially, we have a problem. We have C1 and C2 which are
C1=(-a, as+pe1+M1’)
C2=(-a, as+pe2+M2’)
Now, let’s say we do element-wise multiplication. Then we get
(a², a²s²+p²e1e2+M1'M2'+p(….)+asM1'+asM2')
so essentially we can’t just multiply by s or s² and subtract because even if all the p terms and a²s² term disappear, we are still left with asM1'+asM2' in addition to the term we want which is M1'M2'
So what the authors thought was then why don’t we just make the cipher text longer? Which they did like so
Now, let’s try calculating. We get
(a², -a²s-ape2-aM2'-a²s-ape1-aM1', a²s²+p²e1e2+M1'M2'+p(….)+asM1'+asM2')
Now for this decryption function, we multiply the first term by s², the second term by s, and the third term by 1 and add them together. Now, we get
M1'M2'+p²e²+p(….)
and with mod p, M1'M2'!
Now, while this is cool, it’s not especially scalable. Let’s say we have 100 multiplications. Then we need to increase the size of the multiplication tuple by 100. To solve this, the idea of linearization was done
Relinearization
The idea for this. Given we have
The reason we are expanding/need to add s² to decryption was the c1,0*c2,0 term, why don’t we have a separate term given by our evaluation key which, when added, becomes that c1,0*c2,0 term times s². In particular, let our evaluation key be
evaluation = (a, -as+pe+s²)
then when decrypting we get s² + pe when when we do mod p is s²! Then, with
P = c1,0*c2,0*evaluation
if we add this to C3 without the first term we are done with multiplication!
Now, while this is interesting, it is specific to multiplication. In further research, a technique called key switching came up which basically allows us to have our “evaluation key” act as something like the key above for a lot of operations like rotation which we will discuss in a bit! Also, we grossed over this but we are accumulating errors with every multiplication/addition which needs to somehow be reset!
The paper “Fully Homomorphic Encryption without Bootstrapping” here introduced the idea of key switching and modulus switching that does both make a general procedure for evaluation keys and reducing noise!
Bootstrapping
The paper “Fully Homomorphic Encryption without Bootstrapping” says “without bootstrapping”. Then what is bootstrapping? My understanding of bootstrapping is as follows
Given cipher text c that is close to going over the limit of error
- Decrypt c
- Get message m
- Encrypt message m again with public key
Thus the error is reset.
The main issue with this approach
- You need to know the secret key to reduce the error
- Decrypting c takes up too much computation.
The alternative was discovered below
Fully Homomorphic Encryption without Bootstrapping
Key Switching
The idea of key switching is exactly that, you switch the secret key and also the cipher texts so that when you decrypt you get the same result! To do this first the paper introduces two functions
This decomposes x into bits like so
which becomes binary vectors like so
I understand it as x is an n-dimensional vector that represents coefficients and the above is the binary representation of each of those coefficients.!
This just raises the input(-a) to powers of 2
Now, here if we do bit decomposition of c and decrypt by the power of 2 of s let’s see what happens! Now the one issue is how to do with the log q entries. The answer seems like to just sum it all like so!
Now one thing to note here is <> operation. In this paper this is called the noise associated with cipher text c under key s. In other words, this is the decrypt operation of the public key but given the current error e. So given <(-a, as+pe), s> it will do -as+as+pe=pe so essentially, we are getting the error given the public key input! Now, here, one thing to note is (-a, as+pe), the public key, is not a vector or a polynomial but a tuple with 2 polynomial entries.
In this paper, this vector is made into one vector by flattening like so (as+pe, -a), and for the secret key it will be (1, s). So by doing a polynomial dot product, we get as+pe-as=pe which is a clever way to reformulate! So essentially, <> operation is usually on tuples to get the error/decrypting but in this paper, it’s changed into the dot product. However, since it measures the error, my interpretation is that for all cipher texts, we assume the message is set to 0.
Then key switching is done in two procedures
The first is
Where you get a function on how to switch from s1 to s2. And I’m pretty sure they need to be the same dimension of n1 but let me know if I’m wrong.
To do this, the first idea seems to generate a public key from the secret key and dimension N. So, we generate log q numbers of (as2+pe, -a). Now in this paper, it’s formulated so that we start off with log q columns of as2+pe and another log q columns of -a. The reason why we have log q number of them is the second step!
Since s1 is (1, s1), we get ((1, s1), (2, 2s1)….). We add this to the first half of the columns to generate a matrix B.So the B matrix is (as2+pe+Powersof2(s1), log q columns of -a). While this doesn’t make sense yet, we still just generated the matrix to convert the cipher text from one secret key to another. Let’s see how this matrix is used.
So interestingly we do bit decomposition of the original c1 message and multiply by the matrix to get the new cipher text encrypted by s2! So to say we are correct we have to show
Here, I think the operation of As2 is the same as getting the last log q columns of -a, multiplying that by s2 and adding to as2+pe to get pe with <>. Another thing to note is <BitDecomp(c1), e2> is small since all of BitDecomp(c1) is binary and we know e2, since it’s the error, is small. Then, the idea for this is pretty simple, it’s saying that given the new key we got, we are able to get the new secret key and cipher within a very small error bound.
The proof is
The first line makes sense since c2 is just BitDecomp(c1) times B.
Then, B*s2 = (A+Powersof2(s1))*s2. And A*s2 is 2e2 from definition. Now, if we remember s2 is just (1, s2) so given B=(as2+pe+Powersof2(s1), -a), it’s made so this becomes
as2+pe+Powersof2(s1)-as2=pe+Powersof2(s1)
which is exactly what we got! Where pe=2e2. And the rest of the proof seems simple. Now, why even bother with Powersof2 and BitDecomp since if we just made
A= (as2+pe, -a)
B=(as2+pe+s1, -a)
and switch key just c1 * B, A*s2=2e2, we get
Now, the main problem here is that the 2c1e2 term is not guaranteed to be small! So the whole idea behind the BitDecomp is to keep the error of switching cipher texts and secret keys low.
How can key switching be used to replace Relinearization?
Recall that we had
And in linearization, we added an evaluation key that, when decrypted becomes
c1,0*c2,0*s²
Now, in key switching why don’t we consider switching the key of (c1,0*c2,0, 0) to s² from s! Then, we get
c1,0*c2,0*s²
when decrypted and it can act as an alternative to relinearization! Now, I need to confirm but I think this turns out slightly more efficient!
Discussion on Errors
Now, given a cipher text c which is a ring encoding vector of n dimensions, to recover the message, we need to
- Decrypt with secret key s which is a polynomial of same max degree
- Reduce into the range of decryption to (0, q)
Now, this paper says that decryption succeeds as long as the error is less than q/2. The idea here is very simple. Let’s say we decrypted an addition and we got
p(e1+e2)+M1’+M2’
Now we are about to do mod q here. If p(e1+e2) is larger than q, then we won’t necessarily be able to do mod p on top of the mod q and guarantee we get the right answer! Take for example p(e1+e2)=6, q=5 and p=3. Then while
p(e1+e2) mod p = 0
(p(e1+e2) mod q) mod p = 1
Now, let’s say p(e1+e2)=-3, then
p(e1+e2) mod p = 0
(p(e1+e2) mod q) mod p = 2
so I think it’s intuitively clear that we want the error within (-q/2, q/2). Now, one idea is why not just increase q as much as possible? One thing to note here is that it only makes sense to increase q if p/the error size is kept constant as if we increase p as well the error will grow at the same rate.
As you may have guessed this makes it more insecure. Here, let’s remember what our public key was. It is (-a, as+pe). Now, what we do not want is to be able to find s from a. Or in other words, to find the secret key!
Formally, we want to find s such that given public key (pk[0], pk[1]),
pk[0]*s+pk[1]=-as+as+pe=pe is as close to 0 as possible. Here, my interpretation was as we increase q, as e or pe is relatively constant the error is smaller. Thus as+pe becomes closer and closer to just as. So in a way, this problem becomes more and more similar to just dividing as+pe by a.
Now, let’s look at our operations and see how the error increases! For the decrypted addition, the size of the error becomes roughly 2 times like so.
p(e1+e2)+M1’+M2’
For the decrypted multiplication, the size of the error becomes roughly squared so
M1'M2'+p²e1e2+p(….)
so now let us see how we reduce this error!
Modulus Switching
Now, given we don’t know the secret key, this is a technique to switch our cipher text mod q, c, to a cipher text mod q’, c’, while keeping it correct. In that
where p was the mod of our plaintext! Here, the only restriction is q mod p=q’ mod p.
Interestingly, if p is significantly smaller than q and the secret key polynomial is short, this also decreases the error. The way to do this is just do c’=(q’/q)c and just round! Now, the formal lemma for this is
So when substituting the max bound of | <c, s>_q|, we find that the new max bound is q’/2, which if L1(s) is small, is quite the improvement! The proof is the following
The main mind-boggling part of this proof for me was the following sentence.
My interpretation is this. For c’, it’s chosen as the closest possible integer vector to (q’/q)c. So essentially, since we round up and down this will essentially just be a binary array of the same dimension as c. On the other hand, the secret key has the same dimension but we get it from a ring mod q like c. So essentially, it’s very unlikely that c’-(q’/q)c will ever exceed the norm of the secret key. At least that’s what I think happened.
So essentially, the interesting idea here is we can reduce error efficiently(just dividing by q’/q) without knowing the secret key!
Now, how can this all be brought together to refresh/lower the error? The paper writes this as
now, we have a new secret key and q! I think we reset the secret key for security reasons to not give the attacker too much information. Now, one final tidbit is that this paper argues(and I’ll come back to this in due time) that the above method is way more efficient than bootstrapping.
Back to the Homomorphic Encryption Standard
BFV
Like the BGV, we have the key and error distribution D1 and D2. We have a ring of R/qR. We have R/pR for plain text. In addition, we have integer T and L=log q of base T. And integer W = q/p
The secret key SK seems the same as BGV and is taken R/qR or D1. For the evaluation key, interestingly, we have L samples of ai from R/qR with corresponding random errors, ei, from D2 to get
This reminds me a bit of the relinearization key.
The order is slightly different but it seems we just get the public key in a similar way to BGV where we get random a, e, and do
First, the message, M, is brought into the ring R/pR, same as BGV. Then, the one interesting difference here is we multiply this message by W we had above! Then, we do
which is pretty much the exact same thing we did in BGV with a slightly different order except for W. Now, the next command is the main interesting part of this scheme.
While in BGV, we were able to remove the errors because they were all multiplied by p, in the BFV scheme, when we decrypt, we are left with WM+e for some small error e. Then when dividing this all with W, the idea is e will become so small that it’ll become 0(because q/p will be that large) and when rounding to the nearest integer mod p, we will recover the message M!
This is pretty much equivalent to BGV
Now up to computing C3, this is exactly the same
However, next, we round like so
Why is this? The simple answer is since each cipher text has W times their message, there will be a factor of W²M1M2 as well as W times the errors which will make the errors not disappear when we divide by W! So we are multiplying by the inverse of W to get rid of that excess W factor.
BGV or BFV?
I like BGV more as it feels more exact but in my benchmarks, I notice BFV tends to be faster to evaluate. However, this paper says, at least at the point of this publication, which scheme is better is an open research question as with different implementations/architectures the speed significantly changes.
Learning with Errors(LWE) Problem
I, and this paper, started off with polynomials but let’s think of holomorphic encryption in terms of matrices. Then, instead of just a polynomial a, we can have a matrix A which is, say, m by n. A secret key, s, of dimension n. And the error, e, of dimension m. Then, we can set our public key as
(As+e, -A)
This formulation is called the learning with error problem and is where all the hype surrounding holomorphic encryption started. In particular, the argument here is as follows:
So essentially, you can’t figure out whether (As+e, -A) is any different from any random vector!
This was kind of guaranteed by the fact that this becomes roughly equivalent to one class of problems called the shortest vector problems that everyone has trouble solving. People are even thinking, at the moment, that this scheme is quantum computer proof in that while RSA encryption can be broken, this encryption is safe.
Now, while I say this, why did we use polynomials instead?
The polynomial variant of this is called the “Ring” learning with errors problem and was introduced by the authors of LWE in the same paper. The reason for this is the computation of As, is O(mn). But if we make A a polynomial a, because polynomial multiplication can be done with fast Fourier transforms, the complexity can be lowered to O(n log(n)) so it’s faster and also argued to be the same security!
Fundamental way to attack Homomorphic Encryption
We want to recover a secret vector Zq of n dimension given matrix A of size m by n mod q and vector c of size m mod q such that
As+e=c mod q.
The first method of finding such a secret key came from trying to find the closest vector to c in all integer linear combinations (span) of A! This is called a lattice problem.
As+e-c-kq=0 so the first n coordinates in Z is s. Then, the next m-n coordinate is -k and for the final, I think we do -1. However, I don’t think I quite get how this lattice works to make the smallest vector so I will come back to this part later!
But overall, with a lot of As and output cs there is a way to crack the code using the shortest vector problem! Although it’s very computationally expensive.
Now, while this is all good and well, can we actually run this homomorphic encryption? Or is it at all practical to use? Currently, the answer is no unless you are doing a very small computation. For example, in the paper “Faster Fully Homomorphic Encryption: Bootstrapping in less than 0.1 Seconds” here. The authors found that in these schemes the bootstrapping size can go to upwards of 1GB which they subsequently reduced to 24 MB and seems like currently could be reduced below a Mb here.
Conclusion
Here, we went over the basics of holomorphic encryption introduced by the “Homomorphic standard” and “Fully Homomorphic Encryption without Bootstrapping”. Next time, I plan to go over TFHE and how zoma ai turned this scheme to work with integers/their speed. And also, the below 1 Mb bootstrapping scheme if it looks interesting!