Journalism Tip: Searching for Words & Phrases in a Data Dump

I’ve written before about a few tech tools that journalists can use. Like Slack, which you can use in your newsroom to talk to each other. Or secure instant messengers for journalists.   Some of the best journalism colleges in India also teach a lot of new media tools.

I recently came across a simple problem.

Imagine you’ve just got a big dump of text that you need to search through for the occurrence of a particular word or phrase. Like the file that hackers dumped on the Internet containing a list of all files that are on Sony’s servers.

Searching for a string in a large file using Grep.

Searching for a string in a large file.

One such text file which the hackers posted, had names of thousands of file names. Sitting in India, you probably want to take a look to see if there are any files that have India in their name. It could have clues your next scoop. But how do you go through the whole thing? It’s overwhelming.

I took a short cut and emailed Thejesh, the founder of Datameet. He suggested that I use Grep. It’s used to search plain-text data sets for lines matching an expression. Grep is usually built into Mac Os and if you are running Windows, you can install it here.

Using Grep to search for text.

Using Grep to search for text.

It’s really simple to use. If you are on a Mac, go to Terminal and type in Grep <file name you want to search> <string you want to search> and that’s it, you’ll see the output which will print all the lines from your file that have the string you gave as input.


There’s So Much You Can do With Data! [Notes From Data Journalism Workshop]

Happy to report that the data journalism workshop we held with the good folks from Datameet went well last month. The biggest learning for me was that there is so much more I can do with data.

Nisha from Datameet has made some notes on the Datameet blog. It has some notes from the workshop and links to some of the resources we discussed.

Thejesh & Nisha at the Workshop

Thejesh & Nisha at the Workshop

The agenda with notes are here and the resources have been shared on the data journalism resource wiki page. Datameet has also been putting together a data catalog that you might want to check out.

Josephine & Chris From Citizen Matters

Josephine & Chris From Citizen Matters

Knolby Media hosted us & Nisha is a fellow at School of Data.  Vikas Mishra volunteered to take notes, pictures, and video. Thanks to all.

Now You Can Create Amazing Data Stories, Even If You Suck at Coding

I don’t have to tell you that data journalism is important. It has become a part of mainstream journalism now. Indian publications like The Economic Times (where I work) and of course foreign publications have been making extensive use of new data tools to produce great stories.

This New York Times feature on Reshaping New York, is a great example. Of course that’s a complex project and tough to pull off.

The Economic Times has a brilliant data blog run by Avinash. The Hindu also has one going. Papers like The Indian Express have done a few data projects as well.

TLDR: Data is becoming really important. And data skills are going to be necessary.

One Day Workshop for Data Journalism

With the good folks at Datameet, we are conducting  a day long workshop for journalists, designers or anyone who is interested in learning how to use data to tell great stories. The idea is to find a data project, deconstruct it, learn how it is done and attempt to do one ourselves. It will cost about Rs 700 (including lunch & chai).

Only 15 spots are available. So hurry up & book your spot Now! 

Venue: Near M G Road, Bangalore (To be shared). 

Dates: 31 Aug 2014.

Click Here to Buy a Ticket.

The Problem With Data Journalism: Numbers Only Tell Half the Story

Data is great. If it comes with context. Without that, its piffle. No matter how well it has been visualized.

Case in point is this data story published by The Indian Express last week.  It talks about MPs with criminal charges against them. Besides listing parties with MPs facing criminal charges, Top five states with MPs facing criminal charges and Top parties with MPs facing criminal Charges, it shows top MPs with criminal charges against them. It paints a very ominous picture of these MPs.

M B Rajesh

What’s wrong with this? Everything.

These politicians may have criminal cases against them. But it is not all black and white. Politicians often go to protests, get involved in local tussles and rally against the ruling party.

Sometimes they are slapped with criminal cases. That doesn’t make them criminals. They could be dirty. But the fact remains that these are charges.  And data, because it looks pretty and comes with gravitas, glosses over that.

With the general elections round the corner, we are seeing many interesting data journalism projects. When you see/ make some of these beautiful visualizations, don’t forget your pinch of salt.

The Daksh – India Together Election 2014 Data Journalism Fellowship

Oorvani Foundation, a non-profit media foundation that supports independent public affairs journalism, in collaboration with Daksh, a non-profit focused on improving accountability in politics and governance, is inviting applications for the Daksh-IT Election 2014 Data Journalism Fellowship. Two applicants will be selected to research, analyse data and write on select themes related to the 2014 General Elections.

Fellowship Amount

Rs.75,000 to be paid in two instalments, one at the beginning of the Fellowship and one on completion of publishing the required number of stories.


Professional journalists, including freelancers, in Print and New Media in English, with at least one year of demonstrated experience in in-depth long form journalism. Researchers or Data experts with demonstrated track record of writing in a lucid and articulate manner will also be considered.

Duration of Fellowship

From 20 Feb 2014 to two weeks post elections, by which time Fellows must complete all their submissions.

Fellowship Criteria

Fellows must coordinate with India Together and Daksh teams and produce at least FIVE stories. Each story will be of 800-1500 words length, and supplemented with data and visualisations. A full report, including all findings and data is also to be filed at the end of the period.

Fellows are encouraged to use technology to tell stories in innovate and dynamic ways. They are expected to share the data provided in open source formats.

Articles and pictures will be reviewed and published in India Together and Citizen Matters (as applicable). Articles may also be published in other media with due permission and credit.

Application Requirements

  • Curriculum Vitae
  • Soft copies of three published long form stories, preferably data-based, along with publication title, date of publishing and byline
  • Cover note

Application Deadline

15 February, 2014. Selected applicants will be notified within 5 days.

Send applications to:

Hat tip: Ashwin