Thursday, March 17, 2011

File Carving Tutorial and Challenge

The next few posts are going to be in the form of a series of challenges. In each challenge, I'll post a tutorial to get you started on the topic, then I'll post a few questions that you need to solve. Today's topic is file carving.

The Competition:

  1. The first person who wins each category wins:
    1. Easy: 1 point
    2. Medium: 2 points
    3. Hard: 3 points
  2. Each user can answer ONE (1) difficulty level. 
  3. Each challenge closes in 7 days if there isn't a winner.
  4. The competition will take place over several weeks, and will consist of several challenges, like the one you are about to undertake.
  5. At the end of the competition, scores for each user will be counted up, and the user with the most points wins a prize!
Good luck!

File Carving Basics


1. JPEG

So what exactly is file carving? File carving is the process of finding corrupted or hidden files within a larger file. This larger file could be something as simple as a text document, or as complex (or easy, depending how you look at it) as a disk image. Our goal is to try to recover a file, whether contiguous or in pieces (fragmented). To do this, there are a few techniques we can try. However, today I'm going to stick with what I consider to be the easiest: header/footer analysis.

Figure 1
Figure 2
Each type of file has a header and footer, a special sequence of bytes that signify that this is the beginning or end of the file. For a JPEG image, the header is FF D8, and the footer is FF D9. How can we check this? Try opening up a JPEG image in a hex editor. For this application, I suggest using HxD for Windows, or GHex for Ubuntu. If you look in the hex segment, you should see something like Figure 1. As you can see, the first 2 offsets have a value of FF D8, meaning that it is the beginning of a JPEG image. On the other end, if you scroll all the way down, you should see the footer, FF D9. You can see this in Figure 2. The data between the header (FF D8) and footer (FF D9) is the image file. Once we mix this image file with other data like in the real world, you'll see why knowing the header and footer is so important.

2. Carving Files


When carving a file, you're usually going to be trying to recover a file from a disk image, that is, a bit by bit copy of some type of media. This media can range anywhere from a few megabytes on a floppy drive, to a few terabytes on a modern hard drive. Obviously, the smaller the disk image the easier it will be to carve out files. In file carving, you're going to basically be "undeleting" a file. When you delete a file, it usually isn't overwritten immediately. Rather, it is given the label "overwrite me if you need to", so it's really still there. Sometimes bits and pieces are overwritten, so you need to carve out the pieces you need.

Figure 3
Lets say you have a disk image with a deleted JPEG on it. In order to get the JPEG back, you would first look for the header and footer of the image, just like we did above. Using this method, I could easily search a virtual 1.44 MB floppy disk (I know, ancient right?) for a JPEG. In Figure 3, I basically take the image file (floppy.img) and use grep (search tool) on it to look for our JPEG header. As you can see, there were 3 instances of FF D8 found. The last one looks promising, since it includes "JFIF". The offset is 00022000, so we'll remember that for later. Next, we do the same for the footer, FF D9. Figure 4 shows the results. In this case, we should be interested in the last one; it is the only instance of FF D9 that has an offset GREATER than the header. Based on this, the information between offset 00022000 and 0002b3f0 is the JPEG we're after. If you do this in a hex editor, you can see that there are many null bytes (empty space) after the last FF D9, which further supports the idea that this is the footer.
Figure 4

The next part is to reconstruct the image file. Open up the disk image in a hex editor, and go to offset 00022000, the header. Delete all data prior to FF D8. Then, go to the FF D9 (you can just do a search for "ffd8" in the hex editor at this point) and delete all data after it. Save the new file a .jpg, and BAM! It's the picture we were after!


Now how on Earth is this useful at all? Well, it's funny you ask that. What happens if you're working late in the night on a research paper, only to delete it by accident the next day. What do you do? Call in a professional data retriever? Not anymore you don't! Basically, just scale what we just did up to a hard drive of say 500 GB and do the same thing. Just use a LiveCD to get the hard drive image onto other media. But you can mix it up a bit, since some of the information will be in plain text. You could search the disk image file for, say "The effect of Gatorade on lilies" if you know that that piece of texts exists in your file. Its that easy, and data retrieval companies charge hundreds of dollars to do it.

Competition:


Now for the fun part. I'm going to provide 3 disk images, each 500 KB each. As per the above rules, try your best to discover the hidden JPEG in each. Each JPEG will be a picture of a phrase. If you find and post this phrase under the comment section, you win that difficulty level! Feel free to try them all, but remember, you can only post the answer/attempted answer for ONE difficulty level. Good luck, and make sure to utilize the above techniques!

Hint for Hard: What if there's more than one image in a disk image?

Download Files Here

15 comments:

  1. Good idea KB, I don't have time to participate myself right now. I will have to do it later.

    ReplyDelete
  2. This is the first tech advise Ive understood and can use. Thanks for sharing.

    ReplyDelete
  3. WOw this is pretty cool information.

    ReplyDelete
  4. This is interesting stuff KB.

    ReplyDelete
  5. Sorry, I'm same as Andriod... on this. : I

    ReplyDelete
  6. Too bad I'm on a mac, wanted to try it :(
    good blog one, follower +

    ReplyDelete
  7. This seems like something spies would do.

    ReplyDelete
  8. John A.S: that alright, you can still do it on a Mac. Just download a hex editor that works on OSX (there's a lot of them) and you're all set

    ReplyDelete
  9. man this looks hard.... I'll give it a go though

    ReplyDelete
  10. I might be too late, but I'm going to try. Wish me luck!

    ReplyDelete
  11. UPDATE: the competition is now ongoing! That means you have as much time as you need to complete it. The redesigned site will have a designated score page.

    ReplyDelete

Please leave a comment