Stata

Some useful utility programs for Stata

You’ll probably spend a lot of your time doing data management and statistical analysis (which you are doing in Stata, right?). So, small efficiencies in data related tasks can really pay-off in the long run. One way to get those efficiencies is through creating small utility programs that automate tasks that you perform many, many times. It’s very easy to write short programs for Stata. Below, I offer a few program, each only several lines in length, that I find really useful.

Stata Project Feature

One of the features of Stata 13 and later is “Projects”, which are meant to provide easier access to multiple files related to a, well, project you are working on. The files can be do files, data, logs, graphs, etc. In fact, they don’t even need to be Stata files. One advantage I have found is that they make it possible to maintain a strict organization of certain types of files going in certain directories, while still having access to all of those files from one pane within Stata.

Naming variables, especially in Stata

A consistent scheme for naming your variables is very helpful. It makes coming back to a project after it’s been under review for 3 months much easier and is especially valuable when collaborating with someone else. This is one of those points where there are bad practices and good practices, but no “right” practice. More important is consistent project within (ideally across) projects. So, as a starting point for your consideration, here is what I have developed over time, through lots of trial and error. I think this approach make it easy to find variables and understand their provenance.

UPDATE OF ''What statistical package should I use?''

Technological progress continues. In an older posting, I mentioned the role of specialized packages that addressed models not available in the general purpose software, such as LISREL for structural equation modeling (SEM). That example is now somewhat moot, as Stata 12 has an extensive SEM capability and new add-ons for R allow modeling of SEMs. I suspect that if I were a power user, I would find limitations in Stata/R relative to the dedicated packages, but at my level, I haven’t found them.

A template for Stata .do files

There are many different approaches to writing and documenting the many steps that go into an empirical project. J. Scott Long has a great book, The Workflow of Data Analysis Using Stata, which I strongly recommend. He recommends developing a series of small, highly focused do files, which are run in sequence as needed. I take a different approach, which is keep all of a project’s code in one honking large do file, which is divided into sections.

Excel Is evil

Excel has caused more trouble for more doctoral students than I care to think about. Doctoral students can hurt themselves with Stata in at least two ways (there may be more).
Using it to clean, combine and otherwise manage data
Cutting and pasting results into Excel (or worse yet, Word) and then formatting them for presentation
Both of these a very inefficient uses of time. The first is a disaster for data integrity, because it is hard to document, almost impossible to revise, and very easy to mess up (sort only have the variables, be one row off when pasting, etc.

What statistical package should I use?

This is an amazingly contentious question. My first answer is “If you are comfortable with a package and it is serving your needs, keep using it.” That can be complicated, of course, if you have a co-author dedicated to a given statistics package. If your only need to is pass data back and forth with that co-author, I strongly recommend Stat Transfer, which can convert from pretty much any statistical format to any other.