Saturday, November 29, 2014

Merge multiple lines into one based on the first field: AWK on Windows

When parsing text file, on some occasion you may need to merge different lines based on a field, in this example we will use the first field,column as the pattern or key. You can run:

sort input.txt | awk -F";" "NR!=1 && p1!=$1{print anterior;anterior=\"\"}{p1=$1;anterior=(anterior\"\")?anterior FS substr($0,index($0,$2)):$0}END{if(anterior\"\") print anterior}"

Assuming we have the next text file:
input.txt
F1;String1;String2;String3;String4
F1;String5;String6;String7;String8
F2;Text1;Text2;Text3;Text4;Text5;Text6
F2;Somethingelse
F1;Anotherthing
F2;Anotherone

In this example we want to merge lines based on the First Field of each line.
But we need to sort it first so it F1 and F2 lines are consecutive.


If your text is separated by a different character than semicolon separator replace -F";" by your separator or string. for example -F"," for a comma separated string.
It is case sensitive, if want it to be case insensitive add IGNORECASE = 1 but will not work well on some cases, if you need insenstivie case it would be better to up case or lower case all the text.


Example of merging lines based on first field, semicolon separated text

0 comments:

Post a Comment