To evaluate the validity of, characterize the usage of, and propose potential research applications for International Classification of Diseases, Ninth Revision (ICD-9) tobacco codes in clinical populations.Using data on cancer cases and cancer-free controls from Vanderbilt’s biorepository, BioVU, we evaluated the utility of ICD-9 tobacco use codes to identify ever-smokers in general and high smoking prevalence (lung cancer) clinic populations. We assessed potential biases in documentation, and performed temporal analysis relating transitions between smoking codes to smoking cessation attempts. We also examined the suitability of these codes for use in genetic association analyses.ICD-9 tobacco use codes can identify smokers in a general clinic population (specificity of 1, sensitivity of 0.32), and there is little evidence of documentation bias. Frequency of code transitions between ‘current’ and ‘former’ tobacco use was significantly correlated with initial success at smoking cessation (p0.0001). Finally, code-based smoking status assignment is a comparable covariate to text-based smoking status for genetic association studies.Our results support the use of ICD-9 tobacco use codes for identifying smokers in a clinical population. Furthermore, with some limitations, these codes are suitable for adjustment of smoking status in genetic studies utilizing electronic health records.Researchers should not be deterred by the unavailability of full-text records to determine smoking status if they have ICD-9 code histories.