Automatic categorization of human actions in video sequences is very interesting for a variety of applications: detecting activities in video surveillance, indexing video sequences, content based browsing etc. However, it remains a challenging problem because of factors like background cluttering, camera motion, occlusion and other geometric and photometric variances but some of these problems are surprisingly well handled with bag of words representation combined with machine learning techniques like SVM(Support Vector Machining). We successfully categorized two classes of human actions, namely handshake and standing up. In this project,we extended the use of the techniques existing in object recognition in 2D images to video sequences by determining the Spatio-temporal Interest points(corner points for 2D image) using an extension of Harris operator in the time dimension. Features descriptors are computed on the cuboids around these interest points and further they are clustered and bag of features is built .SVM is used to classify the action class of the video to which it belongs.PROPOSAL PRESENTATION FINAL REPORT CODES