CUSIP codes: how to avoid common problems when opening CSV files

Problems that can occur with CUSIP codes and how to prevent them in Excel and SPSS when importing CSV files

MLE Blog
4 min readNov 29, 2022
Graphic of the issuer, issue and checksum digits of a CUSIP code

Researchers who need U.S. and Canadian company financial information need a unique identifier one of which is CUSIP codes.

CUSIP stands for Committee on Uniform Securities Identification Procedures. A CUSIP number identifies most financial instruments, including: stocks of all registered U.S. and Canadian companies, commercial papers, and U.S. government and municipal bonds.

This post highlights possible areas CUSIP codes may lead to hiccups during data analysis, and how to prevent these problems from occurring.

This post outlines:

  1. What problems can occur with CUSIP codes?
  2. How to prevent these problems

What problems can occur with CUSIP codes?

Different lengths of CUSIP codes

CUSIP codes can either be 6, 8 or 9 digits in length. The first six digits are required. For a 9-digit CUSIP code:

  • The first six digits are the issuer code (the company).
  • The next two digits are the issue code (the stock within the company, where the primary stock is usually ‘10’).
  • The final digit is a check code (read more on Investopedia).

Different databases use CUSIP codes of different lengths. For example, in Eventus, you must provide 8 or 9-digit versions. Also, there may be need to link two databases that use different lengths. In both cases, it is necessary to convert to a shorter or longer length. You can get a longer CUSIP code by using the WRDS CUSIP Converter; you can make a shorter one by trimming digits from the right.

Lost “leading zeroes” or accidental scientific notation

Illustration of CUSIP codes which have become corrupt in Excel
What went wrong with some of these CUSIP codes?

Given the format CUSIP codes may come as; a mixture of numbers letters (such as Dr Pepper ‘26138E109’) or a leading zero (such as Apple Inc ‘037833100’). Inputting these into Excel could lead to it been converted to ‘2.61E+113’ or automatic removal of the leading zero. It is therefore important that care is taken when opening a .csv file in Excel.

^back to contents

How to prevent these problems

We will suggest some good practice for opening files containing CUSIP and GVKEY codes in Excel and SPSS.

If you download data from Compustat, for example, and wish to analyse it in Excel, choose to download to Excel .xlsx. This format should correctly read any CUSIP codes. If you have CSV data instead, do not open it in the usual way as it will corrupt the CUSIP codes. Instead, you can carefully import the file with Excel. There are two approaches depending on which version of Excel you have.

  • In Excel 365, use the modern ‘Get Data From Text/CSV’ command. (There is also a legacy text import wizard if you enable it in the Excel options menu, under ‘Data’.)
  • In Excel 2016 or earlier, use the ‘Get Data From Text’ text import wizard. (There is also the modern command, called a power query, in the ‘Get & Transform’ menu group.)
Excel 365 Data menu, Get Data From Text/CSV
Excel 2016 Data menu, Get Text From Text
The Data menu is different in Excel 365 (left) and Excel 2010–2016 (right). The menus may be more stretched or compacted for you.

Import ‘From Text/CSV’ in Excel 365 (Power Query)

Steps for Excel 365:

  1. Launch Excel, create a new workbook.
  2. In the ribbon, click ‘Data’ then ‘Get & Transform Data’ then ‘From Text/CSV’. Select the .csv file of interest then click ‘Import’.
  3. The text import window launches. Click ‘Transform Data’.
  4. The Power Query Editor launches. Select each CUSIP and GVKEY column and change the ‘Data Type’ to ‘Text’. If you cannot read the warning message, just press the Enter key to accept.
  5. Click ‘Close & Load’ then save the workbook.

Import ‘From Text’ in Excel 2010–2016 (Text Import Wizard)

Steps for Excel 2016 and older:

  1. Launch Excel, create a new workbook.
  2. In the ribbon, click ‘Data’ then ‘Get External Data’ then ‘From Text’. Select the .csv file of interest then click ‘Import’.
  3. The Text Import Wizard launches. In Step 2, under ‘Delimiters’, tick ‘Comma’ only.
  4. In Step 3, hold the Control key and select every CUSIP and GVKEY column. Change the column data format from ‘General’ to ‘Text’.
  5. Click ‘Finish’ and save the workbook.

Import via SPSS

If you download data from Compustat, for example, and wish to analyse it in SPSS, choose to download to Excel or SPSS format. These formats should correctly read any CUSIP codes. If you have CSV data instead, you can carefully import the file to SPSS as described below.

Step 1: Import to SPSS

  1. Launch IBM SPSS Statistics.
  2. In the menu, click ‘File’ then ‘Open’ then ‘Data…’.
  3. Go to the file location, ensure the file type is ‘CSV (*.csv)’.
  4. Follow the instructions in the Text Import Wizard. When you reach Step 5, select each CUSIP and GKVEY column and change the ‘Data format’ from ‘Automatic’ to ‘String’.
  5. Finish the wizard and check your data has imported correctly.

Step 2: Export to Excel

  1. In the menu, click ‘File’ then ‘Save As’.
  2. Change ‘Save as type’ to ‘Excel’.

^back to contents

Further support

You can find further support from the Specialist Library Support Business Data pages.

--

--