Smartphone User Dictionary Files – My Favorite Artifacts, Part One

“Words – so innocent and powerless as they are, as standing in a dictionary, how potent for good and evil they become in the hands of one who knows how to combine them.”
~Nathaniel Hawthorne

Last time on “My Favorite Artifacts,”  I gave a brief overview of what forensic artifacts are and which of my personal favorites I’d be covering in the months to come. In this month’s installment, I’m tackling user dictionary files and sharing a method I call “user dictionary date bracketing” to learn more from these files than you might think possible at first glance.

The Power of a User Dictionary File

I first learned the power smartphone user dictionary files during a sexual assault of a child investigation several years ago. Tragically, the 11-year-old child’s parents were killed in a car crash. She was sent to live with her aunt, who resided with a convicted sex offender.  Within a few months of moving into the home, the child was victimized by her new guardian.

The perpetrator would use the Notes app on an iPad to pass messages to the child about what he intended to do to her. The child would pass messages back to the perpetrator on the same iPad.  After the iPad was passed back to the perpetrator, he would erase the notes.

The child was able to provide details about the assaults in a Safe Harbor interview, but was unsure of the dates when they occurred. The prosecutor needed to know the date range of the assaults in order for the case to be charged, so not having this information proved to be a frustrating roadblock. The child was able to provide some details about the contents of the messages passed back and forth on the iPad, though, which helped focus the forensic examination and ultimately led to the solution to finding the needed date ranges.

While I found many fragments of deleted notes in free pages of the notes.db SQLite database file, I was unable to find the dates corresponding to when the entries were made. Frustrated, I resorted to keyword-searching individual words within the recovered fragments. I soon found that nearly all of the words I searched for had matches in the user dictionary file.

What Is a User Dictionary File?

On smart devices with touchscreen keypads such as iPhones and iPads, user dictionary files are indispensable. Dictionary files assist the device user in spelling things correctly and adds to ease of use of the device, such as with predictive suggestions (though plenty of people have found autocorrect to be a bugbear in its own way). All sorts of devices use them, regardless of the mobile operating system involved. The dictionary file may be populated by some applications and not others, depending on whether the app has permissions to write to the file.

Most default applications on the device can make use of the user dictionary. Its content may even include words synced from other associated devices. As a result, the dictionary file captures portions of the content that gets typed on the keyboard of the smartphone. In this way, the dictionary file is somewhat like a keystroke logger, although it only captures some typed content instead of everything.

A typical user dictionary file reads a bit like strange spoken word poetry and looks like this when opened in Notepad:

A glimpse of a smartphone user dictionary file.

A glimpse of a smartphone user dictionary file

It’s clear to see how much information we can glean about the phone’s user just from reading the contents of the file. Usernames, app names, some context about what the user is typing about, places of employment, and potentially even passwords can be found in the dictionary file.

(This particular dictionary file might look like a passage from a twenty-first-century rewrite of James Joyce’s Ulysses, but it actually comes from the SANS FOR585 Advanced Smartphone Forensics course.  Many thanks to my course co-authors Heather Mahalik and Lee Crognale for letting me use this dictionary from one of our iOS labs! By the way – they are aware of the potential OSINT factor – this is test data.)

Introducing the User Dictionary Date Bracketing Technique

Dictionary file content generated by user interaction with the device is generally (but not always) populated sequentially as the words are typed into the keyboard. This means you can find snippets of conversations or typed strings in the dictionary, but also that they are laid down roughly in the order they were typed. There aren’t any conveniently-placed notations about the date and time at which the content is typed, but the sequential order of the words themselves can be of great investigative value.

User Dictionary Data Bracketing in Practice

For example, imagine that the yellow highlighted string “backup sight for delta point reflex sight” is important to an investigation, but we don’t know when the user typed it. We can use keyword searches of all the extracted from the device using unique terms and words positioned around the phrase until we find matching hits.

In this case, the words “dude” and “what the” were created in the dynamic-text dictionary before the phrase we searched for. They were created as the result of the user, Gus Thomas sending a text message on May 9, at 1:45 and 1:46 (UTC+0).  The term “the purge” was then searched for on May 22 at 1:51 (UTC+0), resulting in these words being populated to the user dictionary.  We could potentially narrow the time frame further with additional keyword searches, but already we know more about the timeline of the events than we did before.

A few cautions and notes on user dictionary files:

  • Many commercial forensic tools present the parsed content of the user dictionary file in alphabetical order. That just isn’t helpful for this technique at all. If you see that your tool has presented the data this way, be sure to look at the raw data from the file itself for the sequential version.
  • The example above is somewhat dated, but dictionary files still work the same way so this technique will work on more modern dictionary files.
  • There is sometimes more than one dictionary file in a given device – be sure to search for and examine all of them!
  • You may find that the dictionary file you’re looking at is not quite sequential, contains repeated entries, and seems incomplete. Or, the dictionary file might be encoded.   All of this is normal. Just be patient and keep digging.

How User Dictionary Date Bracketing Helps

Now that I’ve described user dictionary date bracketing in better detail, it’s time to go revisit the child abuse case in which I discovered and used this technique.

The user dictionary on the iPad proved to be instrumental in the investigation. I painstakingly worked through the dictionary file by hand, referencing typed word combinations back to the non-deleted entries in SMS messages, browser searches, and other database files that still had active content with intact dates and times.

With persistence, I was able to bracket user dictionary word entries with known dates around the content from the multiple deleted notes entries without dates in order to determine a fairly close date and time range for the various notes entries, and therefore the associated assaults. The information I gleaned was exactly what we needed in order to establish a date range for the abuse and bring the perpetrator to justice.

All of this work ended in a guilty plea by the suspect and ultimately a long prison sentence as well. That’s the power of words, even words deleted and forgotten in an electronic device.  Since then, I’ve used this same bracketing technique on all sorts of different cases.  While this example was for an iOS device, the technique works on both Android and iOS devices, and for 3rd party apps as well.

Common Dictionary File Locations

There are numerous third-party user dictionary apps on the market, and you may need to do some digging to locate them. The following is a list of common locations for the dictionary file on various devices.

iOS dictionary file locations – iTunes Backup or data extraction:

Look for a backup file named:

0b68edc697a550c9b977b77cd012fa9a0557dfcb

From a file system extraction look for:

/private/var/mobile/Library/Keyboard/dynamic-text.dat

/private/var/mobile/Library/Keyboard/en_US-dynamic-text.dat  (or whatever language the user chose.)

Android dictionary file locations:

Look in the data/data directory for:

com.android.providers.userdictionary/databases/user_dict.db  

Samsung uses Swiftkey as a default, and the user dictionary can be found here:

com.sec.android.inputmethod /app_SwiftKey/user/dynamic.lm


Happy hunting, and stay tuned for next month’s installment of “My Favorite Artifacts!”