Primeiro emprego como Data Scientist

Téo Calvo

Antes mesmo de pegar meu diploma em Estatística, tive a grande chance de trabalhar como Data Scientist numa empresa de games! Gostaria então de compartilhar essa experiência, criando mais um capítulo no Livro do Guru.

Primeira pergunta: o que um estatístico, com perfil de cientista de dados, faz numa empresa de games mobile? Muita coisa!!

Quando você está jogando ou utilizando um aplicativo instalado em seu celular, a empresa desenvolvedora recebe cada “tapp” que você dá na tela. Assim, é possível saber cada passo realizado durante sua sessão no aplicativo. Isso significa que diariamente uma enxurrada de dados é enviada para os servidores do aplicativos, informando as atividades de todos seus usuários.

Dados são a principal fonte de motivação para um profissional na área da estatística. Eles criam uma curiosidade intensa para descobrir o que podem nos dizer a respeito do jogo, gerando assim, insights em novas tomadas de decisões.

Os analistas responsáveis pelas análises…

View original post 770 more words

Confidence Trick: The Interpretation of Confidence Intervals: Canadian Journal of Science, Mathematics and Technology Education: Vol 14, No 1

(2014). Confidence Trick: The Interpretation of Confidence Intervals. Canadian Journal of Science, Mathematics and Technology Education: Vol. 14, No. 1, pp. 23-34. doi: 10.1080/14926156.2014.874615

Source: Confidence Trick: The Interpretation of Confidence Intervals: Canadian Journal of Science, Mathematics and Technology Education: Vol 14, No 1

Network theory may explain the vulnerability of medieval human settlements to the Black Death pandemic

Complexity Digest

Epidemics can spread across large regions becoming pandemics by flowing along transportation and social networks. Two network attributes, transitivity (when a node is connected to two other nodes that are also directly connected between them) and centrality (the number and intensity of connections with the other nodes in the network), are widely associated with the dynamics of transmission of pathogens. Here we investigate how network centrality and transitivity influence vulnerability to diseases of human populations by examining one of the most devastating pandemic in human history, the fourteenth century plague pandemic called Black Death. We found that, after controlling for the city spatial location and the disease arrival time, cities with higher values of both centrality and transitivity were more severely affected by the plague. A simulation study indicates that this association was due to central cities with high transitivity undergo more exogenous re-infections. Our study provides an easy method…

View original post 54 more words

Bringing Evidence-Based Education to Computer Science | Communications of the ACM

Folk pedagogy encourages what is believed to be best practice, but cannot validate best practice.We have evidence computing teachers do not use evidence. Davide Fossati and I studied 14 CS teachers from three institutions (http://bit.ly/1BV9uNo). Fossati asked them about times they made a change in their teaching practice; why did they make the change, and how did they know if it was successful or not. They used intuition, informal discussion with students, and anecdotes. Not a single teacher used evidence such as class performance on a test or homework.Without evidence, teachers rely on intuition informed by experience. Sometimes that intuition may be informed by years of experience. Sometimes that experience is not at all relevant.

Source: Bringing Evidence-Based Education to CS | June 2015 | Communications of the ACM

Canonical Correlation Analysis | R Data Analysis Examples – IDRE Stats

Canonical correlation analysis is used to identify and measure the associations among two sets of variables. Canonical correlation is appropriate in the same situations where multiple regression would be, but where are there are multiple intercorrelated outcome variables. Canonical correlation analysis determines a set of canonical variates, orthogonal linear combinations of the variables within each… Read More

Source: Canonical Correlation Analysis | R Data Analysis Examples – IDRE Stats

What Does Your Dataset Contain?

Lost Boy

Having explored some ways that we might find related data and services, as well as different definitions of “dataset”, I wanted to look at the topic of dataset description and analysis. Specifically, how can we answer the following questions:

  • what kinds of information does this dataset contain?
  • what types of entity are described in this dataset?
  • how can I determine if this dataset will fulfil my requirements?

There’s been plenty of work done around trying to capture dataset metadata, e.g. VoiD and DCAT; there’s also the upcoming working on Open Data on the Web. Much of that work has focused on capturing the core metadata about a dataset, e.g. who published it, when was it last updated, where can I find the data files, etc. But there’s still plenty of work to be done here, to encourage broader adoption of best practices, and also to explore ways…

View original post 1,296 more words