Primeiro emprego como Data Scientist

Téo Calvo

Antes mesmo de pegar meu diploma em Estatística, tive a grande chance de trabalhar como Data Scientist numa empresa de games! Gostaria então de compartilhar essa experiência, criando mais um capítulo no Livro do Guru.

Primeira pergunta: o que um estatístico, com perfil de cientista de dados, faz numa empresa de games mobile? Muita coisa!!

Quando você está jogando ou utilizando um aplicativo instalado em seu celular, a empresa desenvolvedora recebe cada “tapp” que você dá na tela. Assim, é possível saber cada passo realizado durante sua sessão no aplicativo. Isso significa que diariamente uma enxurrada de dados é enviada para os servidores do aplicativos, informando as atividades de todos seus usuários.

Dados são a principal fonte de motivação para um profissional na área da estatística. Eles criam uma curiosidade intensa para descobrir o que podem nos dizer a respeito do jogo, gerando assim, insights em novas tomadas de decisões.

Os analistas responsáveis pelas análises…

Network theory may explain the vulnerability of medieval human settlements to the Black Death pandemic

Complexity Digest

Epidemics can spread across large regions becoming pandemics by flowing along transportation and social networks. Two network attributes, transitivity (when a node is connected to two other nodes that are also directly connected between them) and centrality (the number and intensity of connections with the other nodes in the network), are widely associated with the dynamics of transmission of pathogens. Here we investigate how network centrality and transitivity influence vulnerability to diseases of human populations by examining one of the most devastating pandemic in human history, the fourteenth century plague pandemic called Black Death. We found that, after controlling for the city spatial location and the disease arrival time, cities with higher values of both centrality and transitivity were more severely affected by the plague. A simulation study indicates that this association was due to central cities with high transitivity undergo more exogenous re-infections. Our study provides an easy method…

What Does Your Dataset Contain?

Lost Boy

Having explored some ways that we might find related data and services, as well as different definitions of “dataset”, I wanted to look at the topic of dataset description and analysis. Specifically, how can we answer the following questions:

  • what kinds of information does this dataset contain?
  • what types of entity are described in this dataset?
  • how can I determine if this dataset will fulfil my requirements?

There’s been plenty of work done around trying to capture dataset metadata, e.g. VoiD and DCAT; there’s also the upcoming working on Open Data on the Web. Much of that work has focused on capturing the core metadata about a dataset, e.g. who published it, when was it last updated, where can I find the data files, etc. But there’s still plenty of work to be done here, to encourage broader adoption of best practices, and also to explore ways…

