"It’ll be interesting to see what the future holds for this role, and whether or not we end up seeing the ‘The Rise of the Data Curator’"
To start an analytics project, a company needs to know: 1) if the current data sources will be enough; 2) if the needed data is available; 3) how to access this data; 4) what exactly this data contains. A data curator (DC) is a mediator between the data engineer and the data analyst, aimed at “finding, surfacing, annotating, even sometimes cleaning and blending data sets and serving them up for broad consumption”. Such repositories of data as Data is Plural, Awesome Public DataSets, and Makeover Monday Project save hundreds of their users’ working hours, steadily providing data sets on a weekly basis. The necessity for a DC is still being argued. According to Ben Jones’ investigation, 70% of the respondents do not have a dedicated DC but need one; 17% consider them useless; 12% have one and appreciate it; 1% need one but don’t have them. Current DC’s opinion: “We want LOB analysts to be able to focus on generating insights rather than assembling, maintaining, and finding data” (Kelly Gilbert). “[A curator] treats analytics as a customer” (Wendy Brotherton). Data analysts see the DC role as a “’part of the job’ category” (Jim VanSisteen, Jason Forrest), or even prefer “raw data” to “adjusted” ones (Daniel Zvinca). The decision of hiring a dedicated DC depends on the following factors: size of the team, complexity of the data sets, the role of data in the business, level of the team members’ knowledge, and level of the analytics processes. Whether a company has a dedicated role for data curation or not, its tasks must be accomplished.