Lab 19: Cracking the Vigenère Cipher ==================================== To refresh your memory of the Vigenère Cipher, go back to :doc:`lab08` and read about it. You will need your ``decrypt_vigenere``, ``num_coincidences``, and ``key_length`` functions from that lab today (tasks 2, 4, and 5 respectively). Your ultimate goal is to determine the plaintext message for the following ciphertext: .. code-block:: console CZVEILTUPRIMCGKBTVSVXVJRTCIBTSQQJHBVHVPKQRJSEUMHMGFBPXWXDOAMECWQTUCZXGIBBVLXGIHBFSDGESBTXLBESFJIWTFDRGTTWKBGQXWXDCBMLTKUWFVIGPCGEGESBPUOWXXMJOQPSEBEHHTIHHTQBPZTKUOGKSCLKBVVECWYVNVMHMJSHUIDYCPBQOIAQITJXPEKQRYMIAQIGRMRMWFRUSGVQBIGVHTVWBPWHHUVRYEHVQBFKHTKKBTKRWXTCJPQXGFOFYIAECGFJIRHWZQHSGMJSUQXSTAANFIWXTTRGPKXTMFNITIAOAFWINRWQYLTMJSEVLTINSNUYGXQTZCOXGIOQCMHREVNKRLHWZQDILHTHUVLTMTCHDPTHHURVXXGIICCRSIKQXKRVMJSQCMHBGGJJICLWRQGRARCKUKXTKCPOKXLBVVCKRZXASFTECVNCFGFNAGFGJIGXYOFPSIAKBTUSKXTMEGQPKMOONIXGVVNVRDKFWQCPXVGHUKRZBVGBXIGROIPJSJMQTGJILTAHBJIPKVVRTEQUKHFCCIHKHFGPUHJRRCVDAFSNTMHACZYDIATVSJJICLJSGJSJZJHVVSKXTOSVIGPCFQUMIHEQHTVTWVCUGVIACHFJIDNIVGVSWTXSJQRSXTSQCXIAKGOWXPMVVRVMBXKHNNPHXGARFUJBVSACXJKCZOWXLAGBGJIGTDPVVERMWOYNCIHQYNYEIVJCHVSUBVGJCMHMECNVTDVMSGCRSEQCXGHPMKHNPHIAGBUWVGBGRBPEABESFVEGMGRGQLTKHSRVJDKKHSNEHAGRNEVDLUVRTQXGFHUCXHAGVNFRTOGFOGJDKGGRGRPKCPOKXLBVVRKXWXTOJCMHMECNVTDVMSGQVPPCHPJXDMCYRQYIHHWGCRSUWFAKRVPKHUEYGBQGVVCHAGFNPERKQGFVLTYKSYFEUMGFVVECWHCEVYCTVSYAAPLLIFVMCMKARVSHXGWGRSEWQKACPPKISECFQBVVBNIJGFSEVLTAGRTGMCTPCGJIGFQARPXSHYBJGRITNWPGEUMGFVVRTOGFBPGTVQBFKHTKKBTJSLBPHUGADKNRFJILTUHBIIIHWHNIEXGVVRTEQUKHUQPTPGBGUXGTKUUVSCEKYRCXJGPSYHSGLQARYENTPRGJICWKDCGHHNFRRPPNWQKAUSHNFRRPPNMJOGCPXVGVNFRDMCABOICMVCGJMCDCPBWXHMQDCKRVAGFFGPUUGTBTIHAGTBWRSAGFFGPUYCZYKRVWQKACZTKARRGTLXNZRKXWXTHUGATENKNUZTKARRGTDKUVRHIAEXSEAWAHYZLHSGLJSUCHEEGBGASUMKARCWHAGKRPXSHYBGQPDHMOOQYIAGFNPHIHYCAFIGPJOGYEHZQWAIXDACDCGRCXZHSKVHMUVRVVXXFHBNSDDFCJPECWOOXGSJMYVNVWWXYOFESBBPUGQFJMKHJCWIHQRNTOIHUSRCRNMJWAIXWXPGUGPDHMSQCXIAGGVFIHHHHUGATENOAFRDMKQRFXWTVHUGCLXTSSKPAXFKVVLRNRPBCVSLCBQDSDDUVRNZTLJSEGECWVVRTIHAGGNYQPIUOAFTXVVIEGWWNPUHRSCIGUFULTMQCXFSLGCXNTJGHOCAGSUMJSFJIAOGGNUWWXROFUISBVKNUPPUGZYGHDKCBTGQPKOOYCHTUWHGQLTKIFRCXSBUOCRSXGVARPXXMYOFGQEMAGUGHXWPCGNMZXVCQTSEMJSWCVUHTTRCVDYMWYNMCZUCZGFDWAGBOECTISQVSENVWGKRIHQBRQJIAGQHRFDTTRFCWHAGTRNPETUHVVATENHUQYVAVOYKGTMQVRTWTEHOSVIGLWQUCJPENOFVLXLKGUCPAMJWAMRDMJWAISUMWAONMCZFCJPWITKFFJSLUTOIGXWXAZYCPAMJWAMQTTVVBOILAAWJQYAWPHFCCPGAHUKRVTDCHVMIXXSAKJXYGZYQJUMJSGQTDYVVRJSJLGKUKGWPCGIGVNEKYRNCIKWSQQACWQKAFSLGYCHNHIAGTNNPCXXSEESBXVCNPICWKKBPHTKJCJOECROWYGWXOGTNNPTGDMGJMHMKARULTLCWQCPDNFWZWWIUGURVXXGIGBOILAGFRPIPKVVREICMTSBHXWXGOEVLAXVARUITMJOGYSJEFPRHSJKVVBWWPGFAVNIHWQKAKXWBPYSQVNHWGRGEABESUCHAXCFAVWTOGFNNXWBPUFQJIAKGFQVIBPVRTPTLUCAUMCMJSFELDHNFBQQPGFHUQYVAVVVUAPLPCGCZTKAUBQHDIRCEVYCBVMSQVHAQKVPKDYHVRTOCHYZRFKTTUHUGVTPCGAQSCXVCYKWIXPHBJIGLVWYNMIPCGTQSSITOPVMRXVCFCCXMQJRTCTLVVNVWPUQIGVLTKKUUVHXLVOAEIQNVHUGRXPQBQGVLACHYCXXMWRRQVAHPUVVYSXKJRISIMQOYKGTACRAQMSXCKUCXATVWGWHTPCGBTPDGIWGWHTXKHUGVQNVHUQYVAVHUGCLXTSAKGTZTOAFADKFGGQWPRRFRUICMNMFJIQXIOACKPBPWJQRSXTWSKWWTNZSCPAKKUUVXWKQITJXWXGOEVLWHYTHPRNBVZYUITFVCPQQTHWHNOSCZVVRRIDINSGJEIPCZXYMIAVVRKVWXCRFFSLGYOEFXWXCBGKTPMJWRUMIAKBXULTPCGECXWXTUYCHIAGFRYEHGQCAGPXLVSAKRVMJWFVMBXCGVVHXWPHFQYCWCHNNPIAGFVILIPQFQDYIBUVNNPWTXSGQEHDVVROAWTVHUGRPFGCSVLTVQIAVVNBUMBWOCHYDYGEHXOONOMHMJWFPILSGOYCRSHTOHUXGTNWNCRSLJSGTMTWVCPWVILGMNUWWXUDBMIUTPQLEYGMUSLKRVTUMBWVTYCZYKRVMJFBWKWMJSNKVSHACHVLXGMMBWGDNNRZCRPZGWGCRSPJOGCRXZPCECRIEKHGNIVBTZFJIAEVVVPOBXHCECWZBPUAQMIENBRXIGWQHBCWZIGFUCTHBUVNNPHXGWGYVXMVSAWTHHOSJJIGXFCJPHDPPRBYRIAGFRYEHGQHUKRVXNGRVSSHUCNNMRXUCBPFTZCBGCPZBPUNIEXGFWACLAEOWFUQTOGFLOYRAVCAKKWMKGUQYAWVVVPOSBPOUYEHMJSPCXXAQDRVLTRNZEGQTFDSEJIGLCIPGVDYOWYMEIMGOGKQTWKBNJQNWGOEKAXLJMBWATKGRBYRWXTSJKXWFGHUGVTTTSAQQXVGWAVLTTKFVOEUKCWQDYIRQIZKKWMEOGELPUCHNPHIACHFXIGRNWXGEBHWGRASJDPCJDYIWQQNVWTTVPNVWXPQBQGVPGFVRTIPEKQRDIVTPHBIIIKCHUGVHEGSCAECWYSAVSCLCMVPKIHJSEUIAYKBNFVTTOMFQVIHHKNAHDVCHFGEIUCHFFSRTVGRCXQTVGNPHHHOSGKQTLFCOCXHXCHPCXHYQFLQYHXGOFULTVQIYFRITPGJGVTBVVRTUJXUHVQRXMFWQPXBNEVZCXIXTKUKGWPCMFJIENVWGULTYGZGVLPMUVRYEHWQNVPKDYHOAFLPWLIFVFTZWBGQHGXCAGJEILJSJCWLTNYVPKWTPRVPLPGFKVVLSBPOUCRSLCMVPKIHJSEXIGRGOEPIHMNMAQASBPOUVIAEOSGJIIKWHUFMSRQIRXIGXCHNDEIPJSAUYSWGBYAXWNODGJYBIFCJPWWXEOZGYEHPOUGEEHHGGKGZLCBQFVNEGOIGWPGFHUGJPENKNUSKXTOYKGTPCGAQXPUKHUWVITPRFJIYNODRFYEHPHBJIGYGSGKRPFQARPXHAGZBQOTWWDOWXXMYOFCPAWCFXQZTKJSNFFTYQFRJIGPCGNPSIAGFYQRVICGFCKTTPRGJILAKHRTEQUKHJCWHMKZYKRHBIVGJYGKAWAIHDPPWGVLTKGKNURDMCABOICMVCOGPDLVOJCCLXPHNNMRXNWXGXWXYWAFECWYOFLYHMKBGKQTMQVRCVXMUOLCWXMVIEPISTECEPIGHJALGEGLCBQYLXLMSEULDPNOGGMILISGVMCZUVRYEHVNCFGFTAKBQKXLAGBFJIINTBRFXWXECEPIGUWHGJIGTDPVVAPLPCYQRVXTHBDIHXGBFJIUHWBQJIGLGZSKRPEQBTNSLACZYYLXVJKNUPXMWDOAEGHYCSNEBIUVNPKXGITEQQIAGFBQJIAGFRYIGXFCBTWPENFBWRSMJSUCPAUWHGJINPGFRCPAEQQXGHPGFKUGRPEKQRJESUGSACPAMJSJCCSHYBBPIHBFSNPHJIVVRQXWXTHEAMCZGJRTCSHQFFJILTNYRFWPWNMQQACMJSZKHSEGKBPHTKKBTJSLLJSJCWTOGFGQKTMQIGCKPBPGHFHTGNMFJIRTOSHRSCTNWGVPTMJFRGPTZISQVEQEGOYNQPWGCSUSABFUYCWHMJSEGAPLPCGJMCZQBVVIMVGDGCXXGAUBNHTGMSLCRSTNWPGWUBTGGVLDNIVGYEHMJOGKXBBIVGDIAHPUGQSCXQTGJISHQFFQJIAGVNNPQNVOYCWTBVVRTXWXNCPMWLXTSGQSATTURQVIAGYRAAPLVCBUQPENPHVEITPMECXTBVKBWPSGQHBRICTPMBHXWXOVBYIKXTCAVLTLGQBPHIBOSEQYCWUVREEBXWDBPEAHYQHTXPBPGUGLPWPCGPSIBESQDIUHTSNPHQXJWAFMIPCGNNMIMNSQQSGTDCHVJXYVSRPMCVJSFJMVAUVRVVXXFHUGPXMVZRISAWGBXGCXGVVRNSRDCBQVSWXTUEGEIWGZVILIBVTVVXTWCZVEIDIGBRFXWXFCBTECWHCHPHIACHVVPTWKBGQEHFCZYREHLCURPSIFWQUNEGZGFGJECTTOGJSAXUVRMRTEVRBYRPGFZBQOTWCZBPKIAGDNUWPZGWAVSIAGZBXIABGGGIEGWGBLQYTOGFFCAWHYGUGPDGISQVSVXVCHVSUMJOGFEGDJOYNECWYOAFIGTDCHVEBHPUGJSHXDSQUSUUTWTJXUEQKRTWPGFHUQWTVQCYHSJGVOVPWQNVGUGGDNNRAQXTOGBTGXWXTVRCHIATCHILIAGRBQVLTAOAFIKXPWSOCWXCRJQYAWICGJVDNIVGJSJZJHCQSGTNWPGMIPQIYFFTHHJRTCABVHYGYHXYWGJSJMOMFJSJEFSEUSWAQKVYMHAKQBWPSLJIGWTABMSNVIAXUQBRIXMJWAMMRHWZQKJXHPZLMRTPJCJVSQXIWAHSGRQIFGIHHOOAASJMQTGJILTAHUKRVLJOQJEEIGBRFPPMGZLVLPMCZVEIWTFPRIYCMQHUKRZMJOGXIGRHSJVLXGIGVPHTXFKRTIGXCZYAMBIQGFKFAXVVRTIHXGARFXDUGBBWWTBPKNKXXGIPLVLTEKHGNISHQFFQWWXYSAVFPVMHBVLTMCPYGLPEHVBRMCZUVROMVAVTVPHPGQHUGVZXACAKXDKCHNPCGTVSNDSDDQTEWPTLHCEULJMVWAITTHRZRWTABMSGGPTLECCGWIAKGGKQTLJSSQYCWCZVVXAXDCGVPTHPWGYLXVJQRTXPBPZLYEHGQHUGVTUGTBTIHTKRNNMRXCBQTSJGFHUGRTVMCSVLTUQHGNILTUOCCTTKNOOGPLBVVGJILHTRFFVXGMARDIPNVWSWPARRFVPXTWQBVVMCECFTGPTMVSEUMIPCGNNPKXTMJGPAMQGNAHGBPYZGFJMVVRYMHXNWGVPTTNWPGAPLPCGISXGIHBFSIACHVPEWNTFLPSXENZBQOUBTGGULTLCWQCRSLGSJJIIAGFVVWBTTYRFTDBUCAQVCHVTBTWWXJOQTIPWUSIGVPEPWPGPXMVZRJMHMQFVGWPUQIGELXEFFRPAWHJOQISIUWFAVECWGOGGRJIDMJKPSUGOFVWPGFCGJIGNPDYGEHTPHGJMCZUOYNFTVCIFGXWXAKBWPSGQHEGQTFDSEVLTLKACNIGNNSFVLTBTTEKICWUVNFXPNIVGVLTFUIPJEHMJOGCVTWJCGRSZXTKVNPQNTBLQYXYACHJSAWKHGQSAHPUNPHIACHVHCDNEIGASJKHWAIIGOGFLFITINMJKXWTMBVHIXMWGHCPARDZRGHHTPRFJIWTFBRXIGYQFTQXIXPHUCXXYACHFVXGMAHELUKQANDSIMNSZCVZXFDBKWDGKHVUEAFQGGEIGMCWAVSSBUOTTITPKHUASJLQCAGVDKNOGGVWHYSIGVIAKGOQXIEGKNURDMOOEMISIQWFQRHHCZVEIKXPHHTISMQHNUXTBVOAFJXGFWAIMIOGFLPMRXKHUCHXGHOPVEHHTHBHQXQGRSNEKHWFBHGWXTFLVEGMEIFVEGWRWAGEEINSEQEHMVIEMINMQTSGIPGFVBVFJMVSEGHIHCGGULTOGFLUSDGHWAKWWXFWGQJU The main fact we will use to break the cipher is that in most English texts the frequencies of letters are not equal (see the table below). For example, the letters ``RSTLNE``, which you may recognize from watching Wheel of Fortune, are among the most frequent. .. list-table:: Frequencies of letters in English :header-rows: 1 * - a - b - c - d - e - f - g - h - i * - .082 - .015 - .028 - .043 - .127 - .022 - .020 - .061 - .070 .. list-table:: :header-rows: 1 * - j - k - l - m - n - o - p - q - r * - .002 - .008 - .040 - .024 - .067 - .075 - .019 - .001 - .060 .. list-table:: :header-rows: 1 * - s - t - u - v - w - x - y - z * - .063 - .091 - .028 - .010 - .023 - .001 - .020 - .001 The following is an outline for how to decrypt a ciphertext encoded with the Vigenère Cipher with an unknown key. (Read until the end of the lab before you start working on any individual part.) 1. Compute the key length `K`. (We already did this in Lab 8 Task 5.) 2. For each ``j`` from ``0`` to ``K-1``: build a string ``S[j]`` comprising the letters of the ciphertext at indices congruent to ``j`` modulo ``K``. For example, if ``ciphertext='ABCDEFGHIJK'`` and the key length is ``K=3``, then .. code-block:: console S[0] = 'ADGJ' # 0, 3, 6, 9 S[1] = 'BEHK' # 1, 4, 7, 10 S[2] = 'CFI' # 2, 5, 8 For each ``j``, every letter of ``S[j]`` has been encrypted with the same letter of the key. 3. For each ``j`` from ``0`` to ``K-1``: a. For each ``m`` from ``0`` to ``25``, decrypt the ``j``-th string using the ``m``-th letter of the alphabet. b. Measure the frequency of each letter in each of the strings in (a), and record these frequencies in a table (list of 26 lists) called ``freq_list``. c. Find the value of ``m`` for which the dot product ``freq_list[m]·F`` is maximized, where ``F`` is the vector of letter frequencies ``F=[.082, .015, .028, .043, .127, .022, .020, .061, .070, .002, .008, .040, .024, .067, .075, .019, .001, .060, .063, .091, .028, .010, .023, .001, .020, .001]`` from the table at the top of the lab. Then the ``j``-th letter of the key is the ``m``-th letter of the alphabet. 4. Decrypt the ciphertext using the key you found. .. admonition:: Understanding the logic The key insight behind this algorithm is that letter frequencies in English text are not uniform—some letters, like ``E``, ``T``, and ``A``, appear much more often than others. The Vigenère cipher hides this fact by "spreading out" the effect of each key letter across different parts of the message. Once we know the key length, we can group the ciphertext into multiple substrings, where each substring was encrypted with the same letter of the key. Within each group, we expect the decrypted letters to roughly follow English letter frequency patterns. To find the correct key letter for each group, we try all 26 possible Caesar shifts (one for each letter of the alphabet), decrypt the group with each shift, and calculate the letter frequency of the result. We then compute the **dot product** of that frequency vector with the standard English letter frequency vector. The shift that gives the largest dot product is considered the best match—because it produces a decrypted string whose letter frequencies are closest to normal English. Repeating this process for every group (i.e., for every position in the key) gives us the full key. With that, we can finally decrypt the entire ciphertext. Task 1: Split the ciphertext ---------------------------- Write a function ``str_split(s,j,k)`` that takes in a string ``s`` and two integers ``j,k``, where ``k >= 1``, and returns a string consisting of each letter of ``s`` whose index is congruent to ``j`` modulo ``k``. >>> str_split('ABCDEFGHIJK',1,3) 'BEHK' Task 2: Measure letter frequencies ---------------------------------- Write a function ``letter_freq(s)`` that takes as input a string ``s`` and outputs a vector of length 26 whose ``i``-th element is the frequency of the ``i``-th letter of the alphabet in the string ``s``. The built-in string function ``count`` may be helpful here. >>> s = 'AAAAAAABZZ' >>> s.count('A') 7 >>> letter_freq('AAAAAAABZZ') [0.7,0.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.2] Task 3: Maximize the dot product -------------------------------- Write a function ``maximize_dot(string)`` that takes as input a string ``string`` and outputs the integer ``m`` for which the dot product ``letter_freq(string-decrypted-by-mth-letter)·F`` is maximized. Here is an outline for this function. (You should copy over your ``vignere_decrypt`` function from lab 8 task 2 in Codebuddy.) .. code-block:: python def maximize_dot(string): F = [.082, .015, .028, .043, .127, .022, .020, .061, .070, .002, .008, .040, .024, .067, .075, .019, .001, .060, .063, .091, .028, .010, .023, .001, .020, .001] alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' freq_list = #list of 26 lists, where the m-th list is letter_freq(string after it has been decrypted using the m-th letter of the alphabet) #(hint: use a list comprehension) for m in range(26): #compute the dot product of freq_list[m] with F #keep the largest dot product you compute return #the m for which the dot product was largest >>> maximize_dot('KYVHLZTBSIFNEWFOALDGJFMVIKYVCRQPUFX') 17 Task 4: Decrypt the ciphertext ------------------------------ Write a function ``vigenere_crack(message)`` that takes in a string ``message`` and outputs a list of two strings: the most likely key and the most likely plaintext. (You should copy over your ``num_coincidences`` function from lab 8 task 4 and your ``key_length`` function from lab 9 task 5 in Codebuddy.) As a test input, use the ciphertext at the top of this page. It will be very clear if you have computed the correct key and plaintext. (Follow the outline at the top of the lab to get the pseudocode for this function.) You can get other test input strings at ``_