Table of contents (Access-I for index page)
2023.10.23: Reducing “gate” counts for Kyber-512: Two algorithm analyses, from first principles, contradicting NIST’s calculation. #xor #popcount #gates #memory #clumping |
2023.10.03: The inability to count correctly: Debunking NIST’s calculation of the Kyber-512 security level. #nist #addition #multiplication #ntru #kyber #fiasco |
2023.06.09: Turbo Boost: How to perpetuate security problems. #overclocking #performancehype #power #timing #hertzbleed #riskmanagement #environment |
2022.08.05: NSA, NIST, and post-quantum cryptography: Announcing my second lawsuit against the U.S. government. #nsa #nist #des #dsa #dualec #sigintenablingproject #nistpqc #foia |
2022.01.29: Plagiarism as a patent amplifier: Understanding the delayed rollout of post-quantum cryptography. #pqcrypto #patents #ntru #lpr #ding #peikert #newhope |
2020.12.06: Optimizing for the wrong metric, part 1: Microsoft Word: Review of “An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development” by Knauff and Nejasmic. #latex #word #efficiency #metrics |
2019.10.24: Why EdDSA held up better than ECDSA against Minerva: Cryptosystem designers successfully predicting, and protecting against, implementation failures. #ecdsa #eddsa #hnp #lwe #bleichenbacher #bkw |
2019.04.30: An introduction to vectorization: Understanding one of the most important changes in the high-speed-software ecosystem. #vectorization #sse #avx #avx512 #antivectors |
2017.11.05: Reconstructing ROCA: A case study of how quickly an attack can be developed from a limited disclosure. #infineon #roca #rsa |
2017.10.17: Quantum algorithms to find collisions: Analysis of several algorithms for the collision problem, and for the related multi-target preimage problem. #collision #preimage #pqcrypto |
2017.07.23: Fast-key-erasure random-number generators: An effort to clean up several messes simultaneously. #rng #forwardsecrecy #urandom #cascade #hmac #rekeying #proofs |
2017.07.19: Benchmarking post-quantum cryptography: News regarding the SUPERCOP benchmarking system, and more recommendations to NIST. #benchmarking #supercop #nist #pqcrypto |
2016.10.30: Some challenges in post-quantum standardization: My comments to NIST on the first draft of their call for submissions. #standardization #nist #pqcrypto |
2016.06.07: The death of due process: A few notes on technology-fueled normalization of lynch mobs targeting both the accuser and the accused. #ethics #crime #punishment |
2016.05.16: Security fraud in Europe’s “Quantum Manifesto”: How quantum cryptographers are stealing a quarter of a billion Euros from the European Commission. #qkd #quantumcrypto #quantummanifesto |
2016.03.15: Thomas Jefferson and Apple versus the FBI: Can the government censor how-to books? What if some of the readers are criminals? What if the books can be understood by a computer? An introduction to freedom of speech for software publishers. #censorship #firstamendment #instructions #software #encryption |
2015.11.20: Break a dozen secret keys, get a million more for free: Batch attacks are often much more cost-effective than single-target attacks. #batching #economics #keysizes #aes #ecc #rsa #dh #logjam |
2015.03.14: The death of optimizing compilers: Abstract of my tutorial at ETAPS 2015. #etaps #compilers #cpuevolution #hotspots #optimization #domainspecific #returnofthejedi |
2015.02.18: Follow-You Printing: How Equitrac’s marketing department misrepresents and interferes with your work. #equitrac #followyouprinting #dilbert #officespaceprinter |
2014.06.02: The Saber cluster: How we built a cluster capable of computing 3000000000000000000000 multiplications per year for just 50000 EUR. #nvidia #linux #howto |
2014.05.17: Some small suggestions for the Intel instruction set: Low-cost changes to CPU architecture would make cryptography much safer and much faster. #constanttimecommitment #vmul53 #vcarry #pipelinedocumentation |
2014.04.11: NIST’s cryptographic standardization process: The first step towards improvement is to admit previous failures. #standardization #nist #des #dsa #dualec #nsa |
2014.03.23: How to design an elliptic-curve signature system: There are many choices of elliptic-curve signature systems. The standard choice, ECDSA, is reasonable if you don’t care about simplicity, speed, and security. #signatures #ecc #elgamal #schnorr #ecdsa #eddsa #ed25519 |
2014.02.13: A subfield-logarithm attack against ideal lattices: Computational algebraic number theory tackles lattice-based cryptography. |
2014.02.05: Entropy Attacks! The conventional wisdom says that hash outputs can’t be controlled; the conventional wisdom is simply wrong. |
2023.10.23: Reducing “gate” counts for Kyber-512: Two algorithm analyses, from first principles, contradicting NIST’s calculation. #xor #popcount #gates #memory #clumping
Tung Chou and I have a new software framework called
CryptAttackTester
for high-assurance quantification of the costs of cryptographic attack algorithms.
So far the framework includes two case studies:
brute-force AES-128 key search,
and, as a deeper case study,
“information-set decoding” attacks against the McEliece cryptosystem.
The accompanying
paper
also looks at other attacks,
covering various old and new examples of how attack analyses have gone wrong.
One of the appendices in the paper, Appendix D,
chops almost 10 bits out of the “gate” count
for “primal” attacks against Kyber-512.
This reduction uses a technique that in this blog post I’ll call “clumping”.
Clumping should also reduce the “gate” counts for “dual” attacks,
but for this blog post it suffices to consider primal attacks.
The reason I’m putting “gate” in quotes here
is that this is using the concept of “gates” in the Kyber documentation.
As we’ll see below, this concept doesn’t match what hardware designers call gates,
in particular in its handling of memory access.
This blog post has four goals:
-
Explain how “31-bit clumping”
reduces the “gate” count for these attacks by several bits.
In particular,
I’ll explain what the Kyber documentation says
is the central bottleneck in the attacks,
and what clumping does. -
Show that 31-bit clumping marginally increases costs when “gate” counts
are replaced with a cost metric
that accounts in a realistic way for the costs of memory access.
This cost metric comes from the NTRU Prime documentation. -
Show that 31-bit clumping reduces costs by several bits
according to a calculation
that NIST misattributed to the NTRU Prime documentation.
Here are NIST’s
words:
“40 bits of security more than would be suggested by the RAM model …
approximately 40 bits of additional security quoted as the ‘real cost
of memory access’ by the NTRUprime submission”. -
In case you’re thinking
“Maybe NIST got the calculation right in most cases
except for some mistake that somehow underestimates the cost of clumping”:
Show that the gap between NIST’s misattributed calculation
and the correct calculation
comes from (1) NIST inflating the correct calculation
and (2) this inflation factor being reduced by 31-bit clumping.
The formulas in question are easy to summarize:
-
If you correctly tally not just the cost of bit operations in an attack but also
the cost of memory access in the attack, you end up with
memcost + bitopscost.Typically attacks are structured as a series of separate iterations,
and the algorithm analysis decomposes into analyses of
memory access per iteration,
bit operations per iteration,
and the number of iterations.
The total cost is then
(memcost/iter + bitopscost/iter) · iter. -
NIST’s
“40 bits of security more than would be suggested by the RAM model”
calculation,
where NIST says this 240 factor is the “real cost of memory access”,
is
memcost/iter · bitopscost,
which is equivalent to
(memcost/iter · bitopscost/iter) · iter.
NIST’s calculation is incorrectly replacing
addition of memcost/iter and bitopscost/iter
with
multiplication of memcost/iter and bitopscost/iter.
The multiplication is nonsense: it doesn’t even pass basic type-checking.
Numerically,
if memcost/iter is larger than bitopscost/iter,
then NIST’s calculation ends up multiplying the correct result
by a fake bitopscost/iter factor.
Clumping reduces that factor, as we’ll see below.
Relationship to surrounding events.
Readers who simply want to check what I’m saying above
can skip past this section
and read about the algorithms.
But I think many readers will also be interested in the important procedural question
of what previous review took place for NIST’s calculation.
What’s deeply concerning here is that NIST,
despite claiming
“We operate transparently. We’ve shown all our work”,
has tried very hard to sabotage review
of its calculation of the Kyber-512 security level.
The starting point was
NIST carrying out its analysis in secret,
including consultations with the Kyber team.
The records of those consultations still aren’t public.
NIST illegally stonewalled in response to my subsequent
FOIA request.
I’ve filed a FOIA lawsuit,
but lawsuits take time to resolve.
In November 2022,
NIST announced its conclusion that Kyber-512 was
“unlikely”
to be less secure than AES-128 when memory-access costs are taken into account.
NIST didn’t explain how it reached this conclusion.
I had to ask repeatedly before NIST
posted
its calculation.
NIST’s posting phrased the calculation in an unnecessarily obfuscated form,
avoiding the normal tools that scientists use to
help reviewers catch any errors that occur:
definitions, formulas, double-checks, etc.
NIST then
repeatedly
dodged
my
clarification
questions
about
the
calculation.
Eventually I decided that NIST’s stonewalling
couldn’t be allowed to halt security review.
So I wrote my
previous blog post.
That post summarizes NIST’s calculation error,
explains how important the error is
in the context of NIST’s decisions regarding Kyber,
and goes line by line through what NIST had posted.
I also sent NIST’s mailing list a
much more concise
sanity check that’s flunked by NIST’s calculation:
an example of an attack where
NIST’s “40 bits of security more than would be suggested by the RAM model”
is numerically very far above
the estimated memcost and bitopscost in NIST’s alleged sources.
The critical point isn’t the magnitude of this particular gap;
the critical point is that NIST is inflating its security-level claims
by using a cost calculation that’s fundamentally wrong.
To highlight what exactly I was challenging,
I started my mailing-list message
with the critical NIST quote and a summary of my challenge:
‘Perlner, Ray A. (Fed)’ via pqc-forum writes:
40 bits of security more than would be suggested by the RAM model
[ … ]
approximately 40 bits of additional security quoted as the “real cost
of memory access” by the NTRUprime submissionThis is (1) not plausible and (2) not what the NTRU Prime submission
says. It is, in context, NIST exaggerating the Kyber-512 security level.
Some of the followups to my message were obviously off-topic,
such as ad-hominem attacks,
praise for Kyber-512’s efficiency,
and praise for NIST manipulating its minimum criteria to allow Kyber-512
(seriously: “adjust the definition of security category 1 if needed,
but make sure to standardize ML-KEM-512”).
Four replies sounded on-topic
but
were
actually
dodging.
One structural aspect of the dodging is easy to see:
the replies being portrayed as defenses of the challenged text
don’t quote the challenged text.
Regarding content details,
the most important dodge is as follows.
The dispute at hand is between the following two calculations:
-
Correct calculation:
memcost + bitopscost = (memcost/iter + bitopscost/iter) · iter. -
NIST’s incorrect calculation:
memcost/iter · bitopscost = (memcost/iter · bitopscost/iter) · iter.
The dodge is to hype the undisputed part—the
multiplication by iter or, in more detail, various factors inside iter—while
ignoring the disputed part—the
multiplication of memcost/iter by bitopscost/iter.
My
previous blog post
had already emphasized the distinction between these two parts
(“The research that would be needed for a correct calculation.
To fix NIST’s calculation,
one needs to carefully distinguish two different effects: …”).
As part of describing this distinction,
my previous blog post had emphasized that the
memcost/iter · bitopscost/iter multiplication is wrong
(“exactly the central mistake highlighted in this blog post”)
while the other multiplication is fine
(“Multiplying the new iteration count
by the cost of memory access per iteration
makes perfect sense”).
But it takes time for readers to read through everything
and see that the replies to my message are misrepresenting the topic of dispute.
A further review difficulty is that
the numbers showing up in the sanity check,
such as 2151 from the Kyber documentation,
are coming from very complicated algorithm analyses in the literature
with very large uncertainties.
It’s terribly time-consuming for readers
to go through those analyses.
I’ve realized that
there’s a much easier way
to see that NIST’s calculation is wrong:
evaluate, from first principles, the cost of
(1) a simple algorithm for carrying out a simple operation and
(2) the same algorithm plus clumping.
Again, what’s important about clumping here
is that it reduces bitopscost/iter without reducing memcost/iter.
The xor-and-popcount operation.
The simple operation analyzed in this blog post is the operation stated on
page 11 of an Asiacrypt 2020 paper:
“load h(v) from memory, compute the Hamming weight of h(u) ⊕ h(v), and
check whether the Hamming weight is less than or equal to k”.
This blog post focuses on the cost of
carrying out many iterations, let’s say P iterations,
of this xor-and-popcount operation.
I’ll explain what the 2020 paper says about the cost,
explain how clumping does better,
look at what the NTRU Prime cost metric says
about the costs of the non-clumped and clumped algorithms,
and look at what NIST’s calculation says about these algorithms.
This xor-and-popcount operation is directly on point:
it’s the core of the primal attack considered in the
latest Kyber documentation.
(Readers who simply want to see the algorithm a