Given a CSV file:
id, fruit, binary1, apple, 12, orange, 03, pear, 14, apple, 05, peach, 06, apple, 1
How can i calculate for each unique values in fruit,
the number of times the binary value =1 / number of occurences of that fruit appearing in the fruit column ?
Another way to do it is to sum the value of the binary column for for each unique fruit.
For example:
For the fruit apple, it appeared with binary = 1 two times and had a frequency of 3. Hence i will get 2/3.
How can i write this in an efficient AWK code?
I know that i can do this to get unique values from the second column:
cut -d , -f2 file.csv | sort | uniq |
or
awk '{ a[$2]++ } END { for (b in a) { print b } }' file.csv
So my non-working code looks like this:
cat file.csv | awk '{ a[$2]++ } END { for (b in a) if ($3==1) {sum+=$3} END {print $0 sum}'
and
awk '{ a[$2]++ } END { for (b in a) { sum +=1 } }' file.csv
need help in correcting my syntax and merging the 2 awk codes