主题:【文摘】山东大学王小云教授成功破解MD5 -- 懒厨
看了他们的paper,没有提及这个问题。如果是2的几十次方,想来那比被水噎死的概率还要小,呵呵。
嗯,不能从签名反推回原码,想要破解还是很难。只给出两个相同的数,确实meaningless。
不过从另一方面来看,这么多人都没有找到这样相同的两个数,确实值得我们得意一下,呵呵
所以他们能找到这个碰撞,是很有价值的,
不过这和“从签名反推回原码”还完全是两码事,
现在他们做到的是:
1. 找到了 两个A, B, 使得 h(A)=h(B),
是不是能够做到:
2. 对于任意一个A和 h(A), 都能够找到 B使得 h(B)=h(A)
还是未知,
至于如果知道 K=h(A), 要从K 推出A的值, 那完全是两码事.
这个进展对发现者来说的确是一大步,但是对于破坏现行的安全体系来说则只是一小步,也就是说,离实际上能够用它来破坏现行的安全体系还差得很远很远。
这个进展所做到的,只是比较有效地发现存在两个不同正文,它们经过MD5处理可以得到同样Hash值;而不是对于指定的一个正文及其经过MD5处理所得到的Hash值,可以轻易得到另一个正文,使得后者经过MD5处理后所得到的Hash值与前者的一致,这样正文的替换依然是基本不可行的,对数据完整性的破坏也就依然停留在理论上,而不是成为现实。
因此,这个进展所可能危害的最多不过是完全且仅仅依靠MD5处理的、直接存储而不是使用签名的数据,其它方式下的数据依然还是安全的。因为数据到Hash值的过程是单向的而不是双向的,反向则是不可能的。
结论:MD5的安全性看来是出了一定的缺口,但是它的可用性还是可以相信的,同时,目前的安全体系还是可以相信的,而不是处于崩溃的边缘。
Verisign的Certificate是作为正文段还是签名段发送给客户的?如果是前者,问题应该不存在,否则的确是个严重的问题。
据我所知,Verisign的Certificate包括了用户的public密钥和private密钥以及一些其它信息,这样它作为签名段发送给客户的可能性应该不是很大,因为数据太长了。
Certificate, more precisely X509 certificate, is based on PKCS11 standard.
This standard specifies everything about the certificate, including how information in the certificate envelope is digested, MD5 and SHA-1 are the common specs. SHA-1 takes more computing power, I think, though I am not expert on this.
Public key cryptography
Public-key cryptography allows one to digitally sign and encrypt information transacted between parties. Public Key Infrastructure (PKI) uses this technology and adds authentication and non-repudiation of the information regarding the parties concerned. Public Key Cryptography Standards (PKCS) is a suite of protocols and algorithms that are used as an industry standard when implementing public-key cryptography and infrastructure. The fundamentals are based on Key Pairs, Message Digests and Certification. These are described below.
A key pair consists of a private key and a public key. The private key is never revealed to any party. The public key is made available to the world, or at least the parties concerned with receiving or sending information. In public key algorithms like those from RSA, any data encrypted with the private key can be decrypted only with the public key, and data encrypted with the public key can be decrypted only with the private key. Stronger encryption uses longer keys. For strong encryption, it is “computationally infeasible” to derive the private key given the public key, or vice versa.
Message Digests are hash functions that take in data and generate a statistically unique digest, like a 20 byte number ?C such that even one bit change in the input data results in a totally different digest. Thus these digests serve as finger-prints of a document. Given a digest and a document, and knowing the hash algorithm, it is easy to verify whether the digest is derived from the document.
Certification is the mechanism by which authenticity is established. A party generates a key pair consisting of the private and public keys. The public key is placed into a certificate request and sent to a certifying authority (CA) like Thawte, IDCertify, VeriSign and so on. The certifying authority (CA) verifies the party’s credentials and the purpose of using the keys, through a vetting process, and then certifies the public key they received. That is, the authority issues a certificate, typically called an X.509 digital certificate that contains the details of the party, the intended use of the certificate and most importantly, the party’s public key. This information is then digitally signed by the CA using the CA’s private key. The authenticity of the certificate itself can be verified by using the CA’s public key, which is made available from the CA’s web site, or comes embedded in a browser by default.
In essence, if you trust the CA, then you can trust that the public key in the verified certificate indeed belongs to who ever the CA says it belongs, and therefore if a digital signature on a document is verified using that public key, the information therein was indeed signed by the party mentioned in the certificate. This establishes authenticity, since only the holder of the corresponding private key could have created that digital signature. And trust in the CA is at the core of this process. If a CA is granted a notary or equivalent status, then the certificate and the information signed or encrypted cannot be repudiated and is valid in many courts of law.
Digital signatures & Data encryption
A digital signature is a digital attestation of a document by a party. This is to establish authenticity. A digital signature is an encrypted digest (hash) of the data to be signed.
One essentially creates a digest or hash (using an algorithm like MD5 or SHA1) from the document data and then encrypts this hash with one’s private key. The encrypted hash thus becomes a digitally signed finger-print for that document, called a digital signature. This signature can now optionally be attached to the document, along with one’s certificate. Anyone intent on verifying the digital signature would verify the certificate for authenticity first, then take the public key from the certificate and then verify the digital signature. The latter part involves decrypting the digital signature with the public key to reveal the digest or hash value. The document is then hashed using the same algorithm to check whether the digest values match.
A digital signature is typically attached to a document. This can be difficult for certain document types. It is required to embed the signature into the document without changing the document (!), which is contradictory. Therefore a signing process only works on the information portion of a document, and uses other sections of the format to embed the signature. For example it is possible to embed signatures into a Word document treating the latter as an OLE compound document. One may also store signatures as attributes of such a document. PDF is another format that is amenable to embedding using the DIGSIG API. Another technique is to create a container document (having a different naming extension) that includes the source document and the signature, and from which either can be extracted. Multiple signatures may be created and attached to a document. The signatures may be peer level or hierarchical level. Peer level signatures imply that one or more parties have endorsed the document by applying their signatures. Hierarchical signatures imply a work-flow and counter-signing process.
Creation of a digital signature involves using one’s private key. In contrast, encryption of information meant for another party uses the other party’s public key. Anyone, knowing that party’s public key can send encrypted information. Only that party can decrypt the information, using his/her own private key.
Anything you read about "digest", "hash" can be substituded with MD5/SHA-1, they are the major specs.
虽然不懂密码学,但听起来很impressive,跟着吆喝一声了
她就不应该公布出来 :)
这是超强武器。等把美国地网站都黑了再公布也不迟。:)
象browser目前的password的digest就是用MD5(default)hash再送到server的,而攻击者完全没必要搞乱password。
还有password的加密比较普通的都用hash,而MD5也是加密程度比较低的default spec,当然还有更低的base64。
当然,偶想象的总是简单了点。
MD5碰撞,及相关解释
摘自[坐看云起-Brian的博客]
http://www.kantianxia.net/blog/briancai/index.php?subaction=showfull&id=1094839036&archive=&cnshow=news&start_from=&ucat=1&
2004年8月17日,在美国加州圣巴巴拉召开的国际密码学会议(Crypto’2004)上,来自山东大学的王小云教授做了MD5、HAVAL-128、 MD4和RIPEMD碰撞的报告。
这篇文章随即在密码学界引起了巨大的反响。甚至有个专家说,这天是:“Bad day at the hash function factory”。(当然,这句话还包括了当天的另外两个碰撞的报告)
一、王小云等人的工作是什么?
通常,我们把一段明文,通过一个MD5(HASH函数)运算,得到一个所谓的密文。那么,在实际中是怎么应用的呢?
这里举一个简单的说明,通常的网站的密码认证都是这样的。
用户通过网页注册用户名和密码。服务器在获得密码后,会把这个用户设置的密码用MD5函数做个运算,然后把这个运算结果存放到数据库中。
当下次用户登录网站的时候,用户通过网页提交了用户名和密码。服务器程序会把这个密码用MD5加以运算,然后在数据库中查找,是否存在这个用户名和这个MD5过的密码。如果存在,那就认为该用户通过了认证。
我们都知道,MD5这种加密算法是不可逆的。既没有办法从密文简单地通过某种算法得到原来的明文。
那么,王小云的这个论文讲的是什么呢?
MD5虽然没有相对应的可逆算法。但有一种情况,就是可能有两个或多个明文,通过MD5运算后,得到相同的密文。这些相同的明文就叫做碰撞。
显而易见,由于碰撞的存在,使得解密的可能性大增。(通常解密的方法是一种叫“暴力法Brute Force”,就是通过计算机,尝试每一种可能性,从而得到密码)。注意,碰撞可能并不是原始密码,而是被认证程序误认为是密码。
因此,密码界把重要的经历放在怎样寻找碰撞上而不是找到可逆算法。
王小云的工作就是找到了一种算法,能在比较短的时间内,找到某段密文的碰撞的算法。按照她论文中的说法,是在IBM P690上,大概一个小时就找到了其中的M,随后在几秒到15分钟,找到了其中的N。
这里是那篇文章,你可以下载下来仔细研究。
http://eprint.iacr.org/2004/199.pdf
二、这会给我们的安全问题带来什么?
那么,我们看看,这回带给我们什么麻烦,或对目前的基于MD5算法的安全性带来什么样的影响?注意,我这里仅探讨三种情况。其他的,都可以按照这些思路分析。
1)用户密码认证
就是前面提到的那种情况,如果刚好这个数据库中存放的密文被破坏者得到的话,那么可以得到相应的碰撞。也就是说,类似于找到了密码。
2)数字签名方面
这通常的应用是这样的。比如从某个网站下载一个文件或软件。通常,这个网站还提供了一个相应的MD5值。即这个文件或软件经过MD5运算后的结果。
这样,当我们下载到本地后,可以对这个文件或软件进行MD5运算。如果结果和网站提供的MD5值是一样的话,就认为这个文件是正确的。
通常,这种方法用来保证文章或软件的来源。
那么,如果找到了一个碰撞,使得有同样的MD5值。也就是说,找到一个可以冒充原来的文件或软件的东西。
其实,我们都知道,即使在实际中找到了这样一个东西,其现实意义恐怕并不大。因为,即使找到了这样的碰撞,而这些碰撞所代表的信息是否有实际含义,还是一个未知数。比如一个软件相对应的碰撞是否还是一个可以运行的软件,那就很难说。同样,一篇文章的碰撞是否还是一篇有实际意义的文章也很难说。
3)数字证书
所谓的数字证书可以看成是上面这种情况的一个特例。即是通过第三方机构签名的包含被认证方信息的密文。
由于受到第三方的控制,存在被攻破的机会要比上面说的要小。
尽管通过分析,可以发现,找到了碰撞可能并没有给安全带来很大的影响。但无论如何,使得MD5存在了被攻破的更大的可能性。
不管怎么说,王小云等人的这一系列工作有重大的意义。这让我们又一次体验奥运会田径项目突破带来的喜悦,不过这次是在信息安全科学上的突破。
MD5的算法在1996年就已经找到所谓的碰撞了。说在pentium上用10个小时就可以找到一个碰撞。
具体的算法请参看
Hans Dobbertin. The status of MD5 after a recent attack. 1996
碰撞这个词,我个人认为翻译的不大好。叫值重合,是不是好理解些?