Database formation in the R software environment: experience of economic research at the municipal level

Egor A. Prokopyev - Candidate of Economic Sciences, Leading Researcher at the Laboratory of Digital Technologies for Regional Development of the Department of Complex Scientific Research. Karelian Scientific Center of the Russian Academy of Sciences

Vladislav A. Igolkin - research intern at the Laboratory of Digital Technologies for Regional Development of the Department of Complex Scientific Research. Karelian Scientific Center of the Russian Academy of Sciences

Abstract

The absence or long lag of publication of official statistics on the level of so cio-economic development of territories forces researchers to turn to alternative data sourc es arising because of the rapid development of digital technologies. Programming language R is required to work with these datasets. The purpose of the paper is to acquaint readers with the capabilities of the R software environment when forming a municipal database for socio-economic research from various sources. The following data sources are considered here Rosstat database “Indicators of Municipalities”; 5-NDFL tax report form, web sources of the Federal Tax Service, website of the Central Electoral Commission of the Russian Federation, TurgetHunter service. As part of the preparatory stage of database creation, it is shown which parameters must be taken into account when creating an auxiliary key table. Using the example of working with data, the functions left_join(), pivot_longer(), fill(), group_by(), arrange(), summarize(), separate() are analyzed. The presented material can be used to develop educational tasks within the framework of the disciplines “Basics of Statis tics” or “Data Analysis”, as well as to prepare a statistical base for research into socio-eco nomic processes at the municipal level.

Keywords: R software environment; database; municipal statistics; alternative data sources; data collection and processing

For citation: Prokopyev E.A., Igolkin V.A. Database formation in the R software environ ment: experience of economic research at the municipal level. Digital models and solutions. 2024. Vol. 3, no. 4. Pp. 27–46. DOI: 10.29141/2949-477X-2024-3-4-2. EDN: VISTZA.

Save Issue