Codementor Events

How to effectively obfuscate your python code

Published Jun 10, 2018
How to effectively obfuscate your python code

Hey, I'm David and today I would like to talk about different techniques you can use to obfuscate your python code. Obfuscation is an incredibly complex and amazing subject and I hope you will enjoy it.

The basics: What is obfuscation

Formally, code obfuscation is defined as the deliberate act of creating source or machine code that is difficult for humans to understand. It is generally used
because the author wants to:

  1. Deter reverse engineering
  2. Prevent tampering with the program
  3. Make a challenge for like-minded geeks

Although obfuscation often gets a bad reputation due to malware authors and bad actors in general, it has a few legitimate purposes. Additionally, the theory behind it is quite nice and the possibilities are endless.

How you should think about obfuscation schemes

You can abstract any obfuscation scheme such that the scheme takes an input x in a specific language and creates an equivalent array of instructions that either in the end has the same end result, although the way to get to that end result differs, or creates the same array of instructions, but removes non-vital information that is only there to help humans. Most good obfuscation schemes combine these two.

Basic technique #1: Removing all information that is useless for the interpreter / parser

A basic way to obfuscate your code is to simply rename all your named parameters (variable names, dictionary key names, all names that have a meaningless assocation with the data they hold; be creative).

Let's take a look at the following code. It is a simple piece of code to find the median of an array of integers, as well as a testing class for said piece of code.

We will use this code throughout the article.

(credits to the python wiki; https://wiki.python.org/moin/SimplePrograms)

import unittest
def median(pool):
    copy = sorted(pool)
    size = len(copy)
    if size % 2 == 1:
        return copy[(size - 1) / 2]
    else:
        return (copy[size/2 - 1] + copy[size/2]) / 2
class TestMedian(unittest.TestCase):
    def testMedian(self):
        self.failUnlessEqual(median([2, 9, 9, 7, 9, 2, 4, 5, 8]), 7)
if __name__ == '__main__':
    unittest.main()

Let's rename all variables to meaningless strings:

import unittest
def a(b):
    c = sorted(b)
    d = len(c)
    if d % 2 == 1:
        return c[(d - 1) / 2]
    else:
        return (c[d/2 - 1] + c[d/2]) / 2
        
class e(unittest.TestCase):
    def e(self):
        self.failUnlessEqual(a([2, 9, 9, 7, 9, 2, 4, 5, 8]), 7)
if __name__ == '__main__':
    unittest.main()

That's harder to read, right? You can even take this concept further by renaming all identifiers to something very similar, like a pattern of '0' and 'O''s, or a pattern of 'l' and '1''s.

Basic technique #2: Syntactical obfuscation

An even more basic way to obfuscate python is to throw away all formatting and things that only exist for your eyes in the code. This is less effective than the previous example, since you are not trashing any vital information, but it still can't hurt to do it, right?

Let's take our obfuscated code again and let's kill the formatting:

import unittest
def a(b):
    c=sorted(b);d = len(c);
    if d%2==1:
        return c[(d-1)/2]
    else:
        return (c[d/2-1]+c[d/2])/2
class e(unittest.TestCase):
    def e(self):
        self.failUnlessEqual(a([2,9,9,7,9,2,4,5,8]),7)
if __name__=='__main__':
    unittest.main()

Although not a big change, it does not hurt to do it.

Advanced technique #1: Expression rewriting

Rewriting arithmetic expressions, int literals, and string literals is a good way to obfuscate because it can hide the true meaning of a number on first glance. It makes it harder to just grep for magic numbers, strings, or expressions in general. For example we can rewrite the expression '1' as '6+97+45-991+844*23+ 54/8/18575.75', the expression 'x = 1; x += 1' as

import math;a=0;c=0;x=0;x|=0b0000001;y=0;y|=0b0000001
for i in range(max(int(math.log(y,2))+1, int(math.log(x,2))+1)+1):
    x1=int(x&(1<<i)!=0);y1=int(y&(1<<i)!=0);z=0;z2=x1+y1+c;z=1 if z2>=2 else 0;c=z;a^=(-(z2%2)^a)&(1<<i)
x=a

(https://en.wikipedia.org/wiki/Binary_number#Addition),

and the expression "Hello World!" as

''.join([chr((ord(__l11111) ^ ord(__111111))) for (__l11111, __111111) in zip('g-I@a¯£²9\nÖÄ', '/H%,\x0e\x8fôÝKf²å')])

(https://en.wikipedia.org/wiki/Exclusive_or)

Appendix A

Rewriting your program as a lambda expression

This is fairly advanced and I considered not including this in this article, but almost all python code can be rewritten as a lambda expression. Let's rewrite our sample code:

(lambda __g: [[[(lambda __after: (unittest.main(), __after())[1] if (__name__ == '__main__') else __after())(lambda: None) for __g['e'] in [((lambda b, d: d.get('__metaclass__', getattr(b[0], '__class__', type(b[0])))('e', b, d))((unittest.TestCase,), (lambda __l: [__l for __l['e'], __l['e'].__name__ in [(lambda self: (lambda __l: [(__l['self'].failUnlessEqual(a([2, 9, 9, 7, 9, 2, 4, 5, 8]), 7), None)[1] for __l['self'] in [(self)]][0])({}), 'e')]][0])({'__module__': __name__})))]][0] for __g['a'], a.__name__ in [(lambda b: (lambda __l: [[[(lambda __after: __l['c'][((__l['d'] - 1) / 2)] if ((__l['d'] % 2) == 1) else ((__l['c'][((__l['d'] / 2) - 1)] + __l['c'][(__l['d'] / 2)]) / 2))(lambda: None) for __l['d'] in [(len(__l['c']))]][0] for __l['c'] in [(sorted(__l['b']))]][0] for __l['b'] in [(b)]][0])({}), 'a')]][0] for __g['unittest'] in [(__import__('unittest', __g, __g))]][0])(globals())

Now look at that! Good luck reversing this!

Cool general obfuscation projects:

(https://github.com/xoreaxeaxeax/reductio): An exploration of code homeomorphism.
(https://github.com/xoreaxeaxeax/movfuscator): A C compiler capable of rewriting every x86 program as an array of mov instructions.
(https://github.com/csvoss/onelinerizer): An obfuscator capable of rewriting every almost python program as a single line

I hope you all enjoyed this article.

David

Discover and read more posts from David
get started