Antes mesmo de pegar meu diploma em Estatística, tive a grande chance de trabalhar como Data Scientist numa empresa de games! Gostaria então de compartilhar essa experiência, criando mais um capítulo no Livro do Guru.
Primeira pergunta: o que um estatístico, com perfil de cientista de dados, faz numa empresa de games mobile? Muita coisa!!
Quando você está jogando ou utilizando um aplicativo instalado em seu celular, a empresa desenvolvedora recebe cada “tapp” que você dá na tela. Assim, é possível saber cada passo realizado durante sua sessão no aplicativo. Isso significa que diariamente uma enxurrada de dados é enviada para os servidores do aplicativos, informando as atividades de todos seus usuários.
Dados são a principal fonte de motivação para um profissional na área da estatística. Eles criam uma curiosidade intensa para descobrir o que podem nos dizer a respeito do jogo, gerando assim, insights em novas tomadas de decisões.
Os analistas responsáveis pelas análises…
View original post 770 more words
machine learning components tend to disrupt established software engineering practices. Léon Bottou
In statistics and data analysis, a raw score is an original datum that has not been transformed. This may include, for example, the original result obtained by a student on a test (i.e., the number of correctly answered items) as opposed to that score after transformation to a standard score or percentile rank or the like.
Source: Raw score – Wikipedia
This article is very well written!
(2014). Confidence Trick: The Interpretation of Confidence Intervals. Canadian Journal of Science, Mathematics and Technology Education: Vol. 14, No. 1, pp. 23-34. doi: 10.1080/14926156.2014.874615
Epidemics can spread across large regions becoming pandemics by flowing along transportation and social networks. Two network attributes, transitivity (when a node is connected to two other nodes that are also directly connected between them) and centrality (the number and intensity of connections with the other nodes in the network), are widely associated with the dynamics of transmission of pathogens. Here we investigate how network centrality and transitivity influence vulnerability to diseases of human populations by examining one of the most devastating pandemic in human history, the fourteenth century plague pandemic called Black Death. We found that, after controlling for the city spatial location and the disease arrival time, cities with higher values of both centrality and transitivity were more severely affected by the plague. A simulation study indicates that this association was due to central cities with high transitivity undergo more exogenous re-infections. Our study provides an easy method…
View original post 54 more words
Folk pedagogy encourages what is believed to be best practice, but cannot validate best practice.We have evidence computing teachers do not use evidence. Davide Fossati and I studied 14 CS teachers from three institutions (http://bit.ly/1BV9uNo). Fossati asked them about times they made a change in their teaching practice; why did they make the change, and how did they know if it was successful or not. They used intuition, informal discussion with students, and anecdotes. Not a single teacher used evidence such as class performance on a test or homework.Without evidence, teachers rely on intuition informed by experience. Sometimes that intuition may be informed by years of experience. Sometimes that experience is not at all relevant.
Canonical correlation analysis is used to identify and measure the associations among two sets of variables. Canonical correlation is appropriate in the same situations where multiple regression would be, but where are there are multiple intercorrelated outcome variables. Canonical correlation analysis determines a set of canonical variates, orthogonal linear combinations of the variables within each… Read More
Having explored some ways that we might find related data and services, as well as different definitions of “dataset”, I wanted to look at the topic of dataset description and analysis. Specifically, how can we answer the following questions:
- what kinds of information does this dataset contain?
- what types of entity are described in this dataset?
- how can I determine if this dataset will fulfil my requirements?
There’s been plenty of work done around trying to capture dataset metadata, e.g. VoiD and DCAT; there’s also the upcoming working on Open Data on the Web. Much of that work has focused on capturing the core metadata about a dataset, e.g. who published it, when was it last updated, where can I find the data files, etc. But there’s still plenty of work to be done here, to encourage broader adoption of best practices, and also to explore ways…
View original post 1,296 more words