home..

Maldoc In Pdf

Polyglot PDFs Embedded with MalDoc/ActiveMime

Maldoc

Recently, a new technique to create polyglot pdf embedded with maldoc was reported by jpcert. The file can be opened in Microsoft Word even though it has the magic header and structures of a PDF file. If the file embedded is configured with VBA, the VBA will be executed by opening it in Word. Didier Stevens released a new version of emldump to help in parsing this kind of PDF file.

Luckily, I was able to lay my hands on the samples released by jpcert and I set off to analyze them. The samples that were analyzed are:

PDF Identification

Analysis Steps:

  1. Identification: Use pdfid to gain information about the samples.

    PDFID All Samples

  2. Entropy Check: Inspect the entropy of the structural components with pdfid. As shown, an unusually high count of bytes outside the PDF streams may indicate a suspicious file.

    PDFID Entropy

  3. Extraction: Further investigate using the extra flag for emldump to view the initial characters of each part. The activemime component will have a header labeled activeMime.

    Head ASCII Identify

  4. Dumping: Use emldump to extract components. The output file will contain the activemime component.

    Dump Activemime

  5. OLE Investigation: With the dumped activemime files in hand, use oledump to dive deeper. You will find that the Activemime files have a macro stream which can be identified by the M beside the stream.

    OLE Dump VBA Stream

  6. VBA Extraction: Extract the VBA streams from the activemime files using oledump. As shown in the screenshot below, you can now view any malicious actions that the maldoc aims to execute.

    5b Macro

    01 Macro

Alternative Analysis:

  1. OLE Parsing: Use oleid to treat the PDF files as if they were OLE files.

    5b Oleid

    01 Oleid

  2. VBA Analysis: With olevba, examine the VBA content. This might include resources like PNGs and JPEGs encoded in base64. The activemime component appears as any other embedded resource appears as, base64 encoded data. As we observed earlier, the active mime component was labeled as a jpeg. We can use this information to narrow down which base64 data to decode.

    Activemime Named as JPEG

    Here is a screenshot showing how olevba parses the extra bytes as macro content:

    Parsed_vba_as_macro

  3. In our sample, the activemime component base64 data appears jumbled up having long whitespaces in between several chunks of the base64.

    Jumbled_base64

  4. Decoding: Decode the base64 encoded data using cyberchef to retrieve the activemime component. Proceed with further analysis using olevba, on the activemime file.

    Cyberchef Activemime

Cheers :)

References:

© 2023 ~Mystik   •  Powered by Soopr   •  Theme  Moonwalk