Proceedings Article | 27 February 2007
KEYWORDS: Digital watermarking, Data hiding, Computer programming, Steganography, Distortion, Computed tomography, Matrices, Associative arrays, Error analysis, Telecommunications
A substantial portion of the text available online is of a kind that tends to contain many typos and ungrammatical
abbreviations, e.g., emails, blogs, forums. It is therefore not surprising that, in such texts, one can carry out
information-hiding by the judicious injection of typos (broadly construed to include abbreviations and acronyms).
What is surprising is that, as this paper demonstrates, this form of embedding can be made quite resilient.
The resilience is achieved through the use of computationally asymmetric transformations (CAT for short):
Transformations that can be carried out inexpensively, yet reversing them requires much more extensive semantic
analyses (easy for humans to carry out, but hard to automate). An example of CAT is transformations that
consist of introducing typos that are ambiguous in that they have many possible corrections, making them harder
to automatically restore to their original form: When considering alternative typos, we prefer ones that are also
close to other vocabulary words. Such encodings do not materially degrade the text's meaning because, compared
to machines, humans are very good at disambiguation. We use typo confusion matrices and word level ambiguity
to carry out this kind of encoding. Unlike robust synonym substitution that also cleverly used ambiguity, the
task here is harder because typos are very conspicuous and an obvious target for the adversary (synonyms are
stealthy, typos are not). Our resilience does not depend on preventing the adversary from correcting without
damage: It only depends on a multiplicity of alternative corrections. In fact, even an adversary who has boldly
"corrected" all the typos by randomly choosing from the ambiguous alternatives has, on average, destroyed
around w/4 of our w-bit mark (and incurred a high cost in terms of the damage done to the meaning of the
text).