Possibilities and issues of using Big Data in sociological research
On 13th of March Alexey Rotmistrov, the Head of Research & Study Group, made a speech as a discussant to the report «Possibilities and issues of working with Big Data: investigation on tenders for exchange of remote work» under the series of seminars «Sociology of Markets» organized by HSE Laboratory for Studies in Economic sociology (LSES).
Discussion on seminar was dedicated to the possibilities and issues of using Big Data in sociological research as exemplified by analysis of tenders on freelance website. Fellow members of Laboratory for Studies in Economic sociology and students of Sociology Department shared their own experience of using Big Data and marked some special aspects that might appear in such cases. There are not only methodological aspects (data before theory, contrary to the classic sociological approach where theory offers the basis for data) but also technical (for example, necessity of permanent maintain on relevance of algorithm that collects data, a large number of missing values, etc.)
Head of Research & Study Group Alexy Rotmistrov as one of discussant to the report in his speech focused on issues that Big Data brings in data analysis. For example, large number of observations and staggered distribution of binary response make logistic regression almost useless, because this method works only in cases, where prediction of «rare» observations is correct and correspondently qualitative.
Correct use of this method requires additional preparation of data. For example, imitative "lining" of the distribution made by extracting a subsample from a more complete category and other techniques used in various methods of machine learning might help in solving the mentioned issues. However, as Alexey Rotmistrov had mentioned, the orientation towards such methods in sociological research should be made with constant reflection on one’s actions, because in computer sciences, unlike sociology, the question of the accuracy of the model is always a priority in comparison with the possibilities of interpreting and generalizing the results.
As recommendations for improving the model and, at the same time, for obtaining deeper, more meaningful conclusions, the head of the RSG also proposed to include the interaction effects in the model, since Big Data provides such opportunities and since many theoretically important predictors in the final model of the authors were not involved. In general, the work that was under discussion is the first and good example of using Web-data for the needs of sociological research.