Journalism Tip: Searching for Words & Phrases in a Data Dump

I’ve written before about a few tech tools that journalists can use. Like Slack, which you can use in your newsroom to talk to each other. Or secure instant messengers for journalists.   Some of the best journalism colleges in India also teach a lot of new media tools.

I recently came across a simple problem.

Imagine you’ve just got a big dump of text that you need to search through for the occurrence of a particular word or phrase. Like the file that hackers dumped on the Internet containing a list of all files that are on Sony’s servers.

Searching for a string in a large file using Grep.

Searching for a string in a large file.

One such text file which the hackers posted, had names of thousands of file names. Sitting in India, you probably want to take a look to see if there are any files that have India in their name. It could have clues your next scoop. But how do you go through the whole thing? It’s overwhelming.

I took a short cut and emailed Thejesh, the founder of Datameet. He suggested that I use Grep. It’s used to search plain-text data sets for lines matching an expression. Grep is usually built into Mac Os and if you are running Windows, you can install it here.

Using Grep to search for text.

Using Grep to search for text.

It’s really simple to use. If you are on a Mac, go to Terminal and type in Grep <file name you want to search> <string you want to search> and that’s it, you’ll see the output which will print all the lines from your file that have the string you gave as input.

FacebookTwitterLinkedInGoogle+Share

Comments

comments