Skip to content

How remove duplicated lines

Using awk

Sometimes lines are duplicated in your text file. You can easily remove these duplicates using awk

awk '!seen[$0]++' files.txt

By adding the -i inplace flag, the original file is modified directly.

awk -i inplace  '!seen[$0]++' files.txt

To remove duplicate lines based on a specific column, such as the second column, replace !seen[$0]++ to !seen[$2]++.

awk -i inplace  '!seen[$2]++' files.txt

Using sort

You can use sort to remove duplicates. The following code sorts the file and selects unique values based on the second column.

sort -u -t' ' -k2,2 file

The flags used are:

  • -u: prints only unique lines.
  • -t: specifies the delimiter (in this case, a space).
  • -k: specifies the column for sorting. For example:
    • -k2: means sorting based on the second column
    • -k1,3: means sorting from column 1 through column 3.
    • -k1: means sorting from column 1 through the end.