Published: March 14, 2018

Often to check the content of a tab delemited file we want to know how many unique things there are in a particular column. Below I give you instructions for checking this using command line and the python package pandas.

 

If you want to check for number of uniq things on the command line or in a shell script

How many unique things are in column <1> of a file named ?

# outputs a count of the unique things in a column
cut -f 1 input_file | sort | uniq | wc -l

#outputs the each of the unique things and how many of each there are
cut -f 1 input_file | sort | uniq –c

If you want to check for number of uniq things using the python package pandas

#my files first line is chr, start, stop, name, score. I want to know how many uniq chromosomes there are or how many lines have each of thechromosomes.

#First open python by typing python (if you are on fiji you must also module load python/2.7.3/pandas)

# outputs a count of the unique things in a column

import pandas
df = pandas.read_csv("allmu.bed", sep="\t")
print df["chr"].nunique()

#outputs the each of the unique things and how many of each there are

import pandas
df = pandas.read_csv("allmu.bed", sep="\t")
print df["chr"].value_counts()

 

 

import pandas

df = pandas.read_csv("allmu.bed", names=["chr", "start", "stop", "name", "score"], sep="\t")
print sort(df["chr"].value_counts())

print df["chr"].nunique()