Finding the number of unique items in a column
Often to check the content of a tab delemited file we want to know how many unique things there are in a particular column. Below I give you instructions for checking this using command line and the python package pandas.
If you want to check for number of uniq things on the command line or in a shell script
How many unique things are in column <1> of a file named ?
# outputs a count of the unique things in a column
cut -f 1 input_file | sort | uniq | wc -l
#outputs the each of the unique things and how many of each there are
cut -f 1 input_file | sort | uniq –c
If you want to check for number of uniq things using the python package pandas
#my files first line is chr, start, stop, name, score. I want to know how many uniq chromosomes there are or how many lines have each of thechromosomes.
#First open python by typing python (if you are on fiji you must also module load python/2.7.3/pandas)
# outputs a count of the unique things in a column
import pandas
df = pandas.read_csv("allmu.bed", sep="\t")
print df["chr"].nunique()
#outputs the each of the unique things and how many of each there are
import pandas
df = pandas.read_csv("allmu.bed", sep="\t")
print df["chr"].value_counts()
import pandas
df = pandas.read_csv("allmu.bed", names=["chr", "start", "stop", "name", "score"], sep="\t")
print sort(df["chr"].value_counts())
print df["chr"].nunique()