The Excel 2010 help file provides more information about the CLEAN function than earlier versions:
“Removes all nonprintable characters from text. Use CLEAN on text imported from other applications that contains characters that may not print with your operating system. For example, you can use CLEAN to remove some low-level computer code that is frequently at the beginning and end of data files and cannot be printed.
“Important: The CLEAN function was designed to remove the first 32 nonprinting characters in the 7-bit ASCII code (values 0 through 31) from text. In the Unicode character set … there are additional nonprinting characters (values 127, 129, 141, 143, 144, and 157). By itself, the CLEAN function does not remove these additional nonprinting characters.”
Actually, CLEAN is more powerful than this information describes.
To prove this for yourself, first use Excel’s Data Series command to create a column of values from 1 to 255. To do so, enter the value 1 in cell A1; hold down Ctrl and Shift and then press the down arrow to select the rest of column A with cell A1 as the active cell; press Alt, E, I, S to launch the Series dialog; specify a step value of 1 and a stop value of 255; then press OK.
Enter a formula to return the character associated with the value in cell A1:
Now enter a formula to clean the text in cell B1 and to return the length of the cleaned text. That is, if CLEAN removes the character, the value in column C will be zero; otherwise it will be 1.
Now copy the range B1:C1 down the column to row 255. Then sort the results by column C, which will group all the cleaned items together.
When you do so, you’ll see that CLEAN certainly does clean characters 1-31. But it also cleans characters 129, 141, 143, 144, and 157…even though Excel’s documentation says that it doesn’t.
If you try this with Excel 2003, you’ll get the same result.
That leaves only two special characters that CLEAN doesn’t handle: CHAR(127) and CHAR(160).
To remove CHAR(127), you can use this formula (assuming your text is in cell B127):
CHAR(160) is the non-breaking-space character, which is used extensively on the Web. To handle this character, we typically need to replace it with a space, and then replace multiple spaces with a single space. Assuming the original text is in cell D5, we can use this formula:
Also, of course, you can use Excel’s Find and Replace command to replace either of these characters. To do so, you need to generate the character so you can use it in the dialog. One way to do that is first to type…
…in any cell; and then press F9 without pressing Enter.
When you do so, Excel changes the formula to a value in your formula bar and selects that value. Now press Ctrl+C to copy that selection. Then press ESC to return to the Ready mode.
Launch Excel’s Find and Replace dialog; paste your copied character into the Find What edit box; enter a space in the Replace With edit box; then choose Replace All.
Finally, to replace multiple spaces with one space, use Find and Replace to replace two spaces with one space. Run this command several times until Excel tells you that it can’t find multiple spaces.