Wednesday, December 10, 2014

WPS - The secret Numbers in Letters

We all have secrets. Some we keep and some we share. The secrets we keep are generally easily managed. Our brain is an excellent safe that holds numerous secrets that no one will ever know. The secrets we share are harder to keep. If we want to send them to others then we need to encrypt them.

However, sometimes we don't want anyone to know that we share a secret. When the secret becomes a secret, we need more than cryptography to send it. We need steganography, the art of hiding messages.

By using steganography (lit. hidden writing) we can send a message through any open insecure channel without others even knowing that a message was sent. It doesn't draw attention or suspicion, as an encrypted e-mail or letter would, and the hidden message is deniable.

In this age of non-existing digital privacy there is still a method of processing and sending messages that resists even the best hackers and MIB companies: the pen and the paper. Just as there is unbreakable pen-and-paper encryption, there is also fully deniable steganography.

Many steganographic techniques were invented in past centuries. Drawings with embedded codes or signs, invisible ink, harmless looking text with minuscule typographic differences or grammatical alterations under control of some algorithm. Most of them, however, fail when it comes to hiding the fact that steganography has been used.

Typographic changes, how little they may be, are visible, since the receiver should be able to see them. Obviously, unusual font changes or extra spaces in digital text files are easily detected. Secret words, embedded at certain places, might be out of context. The required grammatical changes or rules, applied on cover text, often don't stand against the scrutiny of a human reader, as he can easily spot subtle but suspicious changes in natural language that don't fit in the content or style of the cover text.

Fully deniable steganography has some important requirements: it should be impossible to detect the use of steganography, as this would in essence be a failure. After all, its goal was to hide the fact that a message was sent. Also, any attempt to extract the hidden message should never reveal the message nor the use of steganography, even when the method is known. Therefore, the message should always be encrypted prior to hiding. Otherwise, any eavesdropper who knows the steganographic method could extract the plain message.

One method that meets these conditions is the Words-Per-Sentence system or WPS. It's a simple yet effective text-based method to conceal  a message without the use of complex mathematical or grammatical tricks and offers complete freedom of writing style and content. The system consist of three steps: converting text into digits, encrypt those digits and hide them in an innocent cover text.

Step 1 - Convert text into digits

This can be done by a straddling checkerboard. Such a table converts the high frequency letters into one-digit values and the other letters in two-digit values, producing a relatively economical conversion. Optionally, you can use three or four-digit codes (preceded by 0 - CODE) that represent common expressions or phrases, listed in a small code sheet or book, to compress the message considerably (more about code books in section VI of this paper (pdf).


Let's convert the phrase "meeting at 14 PM in NY." Note that we repeat figures three times to exclude errors.

M  E E T I N G     A T     1   4     P  M     I N    N Y  (.)
79 2 2 6 3 4 74 99 1 6 90 111 444 90 80 79 99 3 4 99 4 88 91

Step 2 - Encrypt the digits

The letter-to-digit conversion is no protection whatsoever! We could scramble the letters of the checkerboard, but this provides only very limited protection. So, we must encrypt the digits. There are various manual cipher systems, but the most secure one is the unbreakable one-time pad. More detailed info in this paper.

Suppose our one-time pad key starts with the following groups:

68496 47757 10126 36660 25066 07418 79781 48209 28600

The one-time pad key is written out underneath the plaintext digits. The first group of the pad serves as key indicator for the receiver and must be skipped in the encryption process. The key is subtracted from left to right from the plaintext without borrowing (a so-called modulo 10 subtraction):

Plain : KEYID 79226 34749 91690 11144 49080 79993 49948 89191
OTP(-): 68496 47757 10126 36660 25066 07418 79781 48209 28600
        -----------------------------------------------------
Cipher: 68496 32579 24623 65030 96188 42672 00212 01749 61591

Step 3 - Hide the encrypted digits

Now that we have a secure message we must hide the ciphertext digits in a text. For each digit, a sentence is composed with as many words as the digit + 5 (or any other pre-arranged value). Adding 5 to the total ensures that all sentences have at least five words. Words like “it’s”, “you’re” or “set-up” are regarded as one word. To avoid statistical bias, some sentences with less than 5 or more than 14 words should be added (these are later simply ignored). The first ciphertext group 68496 from our example message is hidden in the first part of a letter, shown here below:

Dear John,

I Hope everything is going well with you and the family. If possible, Katherine and I would love to visit you somewhere next month. We could make it a weekend at the lake. The next few weeks are rather quiet so any date is fine for us. What do you think? If you’re interested, just pick a date and I arrange everything.

To retrieve the original digits, the receiver simply subtracts 5 from the total number of words in each sentence, ignoring sentences with less than 5 or more than 14 words. He counts 11 words in the first sentence and thus knows that the first digit is 11 – 5 = 6, and so one. He writes the proper one-time pad key underneath the extracted digits (skipping the key indicator) and adds ciphertext and key together without carry (modulo 10 addition). Finally, he converts the plaintext digits back into readable text with his own checkerboard.

The advantages of WPS are an excellent literary freedom and the lack of complex calculations or algorithms. Always start by writing a meaningful text and then play with the words to obtain the required sentence length. Exclude the salutation in a letter from the system, as a nine-letter salutation would obviously arouse suspicion.

Thanks to WPS, the hidden message is fully deniable. There is no way to ever prove the existence of a message inside the innocent looking letter without having the proper one-time pad key. Even when the eavesdropper knows the method used, he can merely extract some meaningless digits, as he would retrieve from any other "clean" text. We now have a safe method to send encrypted messages openly by postal mail, e-mail or Internet forums.

Or how you can hide numbers in letters ;-)

This pen and paper WPS system is an important advantage in today's digital world where secure  personal computers, smartphones or tablets are a fairytale and virtually all means to communicate are prone to eavesdropping. Of course, the cover text itself can be read by anyone and you will need a good excuse for the nonsense you wrote and to whom you wrote it.

Further reading:

5 comments:

Anonymous said...

WPS does not seem very efficient. You need to write a long message to encode one sentence. Maybe more practical in combination with a codebook?

It might help if you change this "prove you're not a robot" verification system in the "comments". There are more user friendly systems (like the house number).

Dirk Rijmenants said...

@ Anon,

WPS is actually pretty efficient. The checkerboard also contains a code prefix (0). Also, additional extensive information about using code sheets/books to compress the inserted message is found in the links I provided. I might indeed state this more clearly. I will add a reference to this in the post.

Steganographic methods to achieve a higher payload for a given carrier text do exist but these all have a common flaw: the higher the payload, the more obvious it gets that the carrier text is manipulated. You can only achieve a denser payload by modifying adjectives, conjugations, adverbs, changing the syntax or replacing words by prearranged synonyms. Software exists that does a nice job in manipulating text to insert more information than WPS, but this inevitably results in curious phrases, up to plain ridiculous nonsense. Virtually none of these will survive the scrutiny of a human, reading the carrier text, and will at the least arouse serious suspicion. WPS gives complete linguistic freedom to write in the words and style of that specific person about a subject that makes sense. Forcing a higher payload produces the weirdest pieces of text.

Inevitably, steganography will enable you to insert a short message in an essay, but never an essay in a short message.

TheJH said...

Can't an attacker detect the use of steganography because the sentence lengths have higher entropy than normal?

Dirk Rijmenants said...

@TheJH, entropy is a wonderful statistical tool in cryptography that can provide various clues about a set of letters or digits in a ciphertext. The entropy of – enough - letters can produce a pretty accurate indication of the nature of a language. Letters, their combinations and position in words are never random but follow strict linguistic rules that define each specific natural language. This is where entropy is at its best.

Calculating the entropy of the lengths of sentences is a whole other thing. The lengths of a given sentence does not relate to the lengths of sentences before or after it, nor does its follows strict linguistic rules. The length of a sentence isn’t determined by linguistic rules and their typical statistical properties, rather by the linguistic skill of the writer and complexity of the subject (small talk, literature, technical). Also, less adept writers and readers find sentences with less than 10 words easy to write/read and more than 15 words more complex, something for the more skilled. WPS with its (random) ciphertext digits will produce random sentence lengths between 5 and 15 words, with an optionally added variable number of longer or shorter non-digit-hiding sentences (call it nulls). Therefore, entropy calculation of these lengths will not provide any conclusive results, let alone determine whether they indicate steganography.

As the writer will start by composing a cover text before adapting it for hiding the digits, he will quickly notice whether the to-be-adjusted sentence lengths suit his style. If he is used to write longer sentences, he can either write more longer null sentences or when he really hates/avoids shorter sentences and wants to raise the whole spectrum of lengths, he simply changes the digit + 5 rule into, for instance, digit + 9, or any other value, meanwhile still being able to write some short null sentences if required.

Anonymous said...

seems like a psudosteganographic "noise" of "traffic" with zero actual meaning would tend to occupy a disproportionate effort by those actors surveying the stream and thus make statistical probability of the examination of an Nth discrete real communication take longer to achieve - thus a metasteganographic technique that creates a crypto-crypto is a logically implied strategy. one might name this strategy "LBCW" (little boy cries wolf).