Marie Herman shares her four tips for data manipulation in Microsoft Excel
Excel is a great tool when you need to take data in one format, manipulate it into another format, and push the results along to another process (e.g. a database) or work with it further in Excel. I’m going to share four data manipulation tips with you today:
1. Combine Columns Using the CONCATENATE Function
While you can do this with the flash fill feature of Excel, there are times when you may need to combine multiple columns. You may want to combine two columns into one, such as when you have a first and last name field. You can do this easily with CONCATENATE. For example, if cell A1 has first name and cell B1 has last name, you can make a formula in cell C1 that would be =CONCATENATE(A1,“ “,B1).
What’s the point of the “ “ (double quotation, space, double quotation) in the middle? Double quotes are a way of specifying text to include (in this case a blank space).
Use the Function Wizard to walk you through setting up the formula and get a preview of the finished product. This is especially useful if you are combining multiple fields. How many can you combine? How about 255 items, up to 8,102 characters!
If you want the formula results to be values instead, copy the formula and then click on Home, Paste – Paste Special – Values. This will convert the results of the formula into values, just as if you had typed them directly in the cell.
2. Separate Columns Using Text to Columns Feature
When data is exported from other programs, it doesn’t always come through as neatly separated as we might like. For instance, you might receive a spreadsheet with the city, state and postal code all contained within one column. Excel can correct that.
First ensure you have enough blank columns to the right of the current column to hold all the data; any existing data will be overwritten when you separate the columns. Second, click in the data range of the text you want to separate. Then click on Data, Text to Columns. This will bring up the import text wizard. It will want to know if you want to separate on delimited (some kind of character like a tab or comma or space) or fixed (every “x” number of characters).
Using our city, state, postal code example, you would separate on the delimited comma between city and postal code first. Doing this would create one column with the city and a second column with the postal code. Then you would do Data – Text to Columns a second time and separate by the fixed width (if two letter states) or by a space (if states are spelled out). Again, make sure you have a blank column to do this or the postal codes will overwrite the column to the right of where you are when it separates.
Be careful when using this feature with what you choose to separate the fields on. For example, if you chose a space and you have a city with a two-word name, those two words would end up in separate columns, thereby throwing off your column alignments as the second word of the city name would move into the column that the rest of the rows have for their state field.
3. Consolidation – Combining Two Lists into One
Microsoft Excel can combine two lists into one and add up numeric values and eliminate duplicates in the process.
Move the cell pointer to a blank area of the worksheet with enough blank rows and columns for the data you will end up with. Click Data – Consolidate.
Function allows you to select a mathematical function. Sum is on by default, but you could also do Count, Average, Max, Min, Product, Count Numbers, Std Dev, Std Devp, Var, and Varp.
Put the cursor in the Reference field. Use the collapse button at the right end of the field to click and drag to select the first range you want to consolidate. Press enter when complete. Then click Add to add this range to the All References section. If you use the consolidate function with just one list, it will consolidate your list down to unique entries.
Once the first range has been added to the All References box, repeat the process to add additional ranges to the All References field.
Click the Browse button if you want to select data ranges from other workbooks.
To add labels from your data at the top or left, click the appropriate check boxes.
You can choose to have your consolidated data link to the original source data. If you don’t click that box, the new list is static (as if you had typed in the data manually). When you update the original lists, they would not be reflected in the new consolidated list by default. You could therefore delete your original lists, if you wished, and just keep the new consolidated list. If you link to the source data, the list becomes dynamic and deleting them will cause issues with the consolidated list.
Important Note:
You can only select this check box if the worksheet is in another workbook. Once you select this check box, you won’t be able to change which cells and ranges are included in the consolidation.
Click OK. Excel will return a brand-new list that extends down and to the right from the blank worksheet cell you first clicked. The list will contain one instance of each entry in the left-hand column along with the appropriate entries in the other columns (including summed totals where appropriate).
You can sort and edit the list at this point. Excel does not fill in the label in the upper left corner of the consolidation, so you will need to type that label manually before you sort.
Be aware that fields like dates will be consolidated down to the first entry, so this technique isn’t necessarily appropriate for all data ranges.
4. Remove Duplicate Rows
If you have ever had to combine multiple spreadsheets into one spreadsheet, you know how tedious it can be to eliminate duplicate entries. Excel can remove them for you, though it should be used with caution as it truly is deleting your data. Using this data as a sample:
Click Data, Remove Duplicates. Excel automatically selects your cell range. A window pops up and shows you a list of your columns with checkmarks in front of all the column names.
If you leave all three columns selected, Excel would check for duplicates in all cells. That means that ALL THREE CELLS have to match ALL THREE CELLS in another row in order to be considered a duplicate. Thus, you would find NO duplicates if you ran it right now.
If you uncheck Column C but leave columns A and B selected, Excel would remove the duplicate rows where there were exact matches in BOTH CELLS on each row and leave you with one row each of John Smith, Mary Smith, and Allan Smith.
You don’t get to pick which rows stay or go. Excel goes in order from top to bottom, keeping the first occurrence and deleting any duplicate rows that follow. Excel also doesn’t delete the full rows in your spreadsheet when it removes duplicates. It deletes the cells in the row within the selected data. Data outside your range is not affected. (So, if you had data in cells E1-E6, it wouldn’t be affected by the remove duplicates process taking place in cells A1-C6.)
If you were to uncheck Columns C and A, but leave Column B selected, Excel would remove ALL duplicates of Smith, which means you end up with one entry: John Smith.
If you preselect cells and you choose “continue with that selection” when the window pops up after you click Remove Duplicates, it will delete the duplicate cells from what you have selected and may leave blank cells in your data.
Final Thought
Excel is a powerful tool, but you can’t use it to its full potential unless you know what functionality it contains. I hope these four tricks will give you some insight into ways that you can have Excel modify your data to improve the final results.
Thank you
Thank you for this piece, its really helpful though no 3 is kind of difficult for me to get.
What was your struggle with it? Happy to help if I can. 🙂