The research of local BDSM community, the study of social determinants of film choice, the actions in social networks are following directions of researches by parsing in Research&Study group
Regular seminar of RSG was held on 24th July during the summer holidays. It was devoted to discussion about following directions of researches. The members of the group represented the first results of their investigations and discussed potential and limitations of parsing methods.
Junior members of Research and Study Group presented their research projects based on data extracted from the Internet at the last seminar. It is the main RSG’s direction for the next month. Maria Rodionova wants to continue exploring violence in relationships by studying local BDSM community. The data on gender, age, place of residence, BDSM-interests was extracted from Russian-language online dating service for searching BDSM partner(s). Now Maria faced the problem of illegible code structure of webpage that obstructs data extraction by previously studied “spiders” on Python (in the WYSIWYM channel - What You See Is What You Mean). However, Maria has studied another method of data extraction using the visualization of this process (in the channel of WYSIWYG - What You See Is What You Get) and presented it on the seminar. She has collected 98000 anonymous user profiles on investigated website for her research.
Dinara Khayrullina presented her project about the comparison of generation Y (people born between 1981 and 1994) and generation Z (people born between 1995 and 2004) behavior. The research includes publicly available indicators extracted from users' personal pages such as world view, personal priority, as well as indicators characterizing the user's activity in the social network. Tamara Mkhitaryan also plans to deal with the same subject: her research is devoted to studying the life goals of university students occupying different places in university rating. She is going to extract indicators about interests of students, e.g. subscriptions to communities, posts on a personal page, as well as activity indicators and socio-demographic data. They decided to learn and use the VKontakte API which is a specially designed website developers’ interface because data should be extracted from social network.
Finally, another area of research is the study about social determinants of the movie choice. Vorobyeva Marya extracts data by Scrapy ("web spider" written in Python) from IMDb (Internet Movie Database) website. Despite the technical difficulties during data extraction, Marya has collected 17000 observations for her research - these are feature films released between 2014 and 2018. Potential determinants of the movie choice are user and film critics evaluations, and the number of awards and nominations.
In general, data collection by site parsing was successful, and the difficulties only helped the members of group to better understand the method and get acquainted with alternative approaches to the automated collection of information from the Internet. The next stage is an analysis of the collected data, the first results of which Scientific training group participants will present in the next seminar.