Modules

Sample Project Structure

os

get current working directory
os.getcwd()

XlsxWriter

install
$ git clone git@github.com:jmcnamara/XlsxWriter.git
$ cd XlsxWriter
$ sudo python setup.py install

coursera-dl

SimpleHTTPServer

start server in folder
$ python -m SimpleHTTPServer

IPython / Jupyter

install
$ sudo dnf install python-devel
$ sudo pip install notebook
usage
$ jupyter notebook

JupyterLab

reads kernels defined both at ~/.local/share/jupyter/kernels/ and ~/.ipython/kernels/

to start, run the following command

jupyter lab

R Kernel

install CZMQ High-level C binding for ZeroMQ in Fedora shell
sudo dnf install czmq-devel

in R

install.packages(c('rzmq','repr','IRkernel','IRdisplay'),
                 repos = c('http://irkernel.github.io/', getOption('repos')),
                 type = 'source')
IRkernel::installspec()

Scala Kernel

install
$ cd ~/Downloads
$ wget https://oss.sonatype.org/content/repositories/snapshots/com/github/alexarchambault/jupyter/jupyter-scala-cli_2.11.6/0.2.0-SNAPSHOT/jupyter-scala_2.11.6-0.2.0-SNAPSHOT.tar.xz
$ cd ~/Downloads/jupyter-scala_2.11.6-0.2.0-SNAPSHOT
$ ./bin/jupyter-scala
$ jupyter console --kernel scala211
Zen of Python
https://www.python.org/dev/peps/pep-0020/

Django

surveys

Official Python Tutorial
Python resources
NumPy
SciPy

Machine Learning

AI: building decision rules 80’s machine learning: learn these from observations 90’s statistical learning: model the noise in the observations big data: many observation, simple rules

  • hasher.fit-transform: transform word count list of strings into matrix
  • estimator.partial_fit
  • www.wendelin.io: Wendelin Industrial Big Data
  • Microsoft Benjamin: benguin@microsoft.com

scikit-learn

ENSAE course material

  • sklearn_ensae_course
  • clone repository and navigate to “rendered notebooks” folder and execute ipython notebook
  • alternatively, copy link in http://nbviewer.ipython.org/github/[name]/[repo]

IPython notebooks

Webscraping

Crawling part

Scrapy

Extraction part

Books

Python for Data Analysis
Author: McKinney, Wes
Subtitle: Agile Tools for Real-World Data
Publisher: O’Reilly
ISBN: 978-1-449-31979-3
Year: 2013
Tags: NumPy, pandas, matplotlib, IPython, SciPy
GitHub: git://github.com/pydata/pydata-book.git
Python for Informatics
Author: Charles Severance
Subtitle: Exploring Information

Think Series by Allen B. Downey

Think Bayes
Subtitle: Bayesian Statistics in Python
Publisher: O’Reilly
ISBN: 978-1-449-37078-7
Year: 2013
Think Complexity
Publisher: Green Tea Press
Year: 2012
URL: greenteapress.com/complexity
Think Python (v3)
Subtitle: How to Think Like a Computer Scientist
Publisher: Green Tea Press
Year: 2008
URL: thinkpython.com
Think Stats (2ed)
Subtitle: Exploratory Data Analysis in Python
Publisher: Green Tea Press
Year: 2014
URL: thinkstats2.com

Mailing Lists

  • pydata: a Google Group list for questions related to Python for data analysis and pandas
  • pystatsmodels: for statsmodels or pandas-related questions
  • numpy-discussion: for NumPy-related questions
  • scipy-user: for general SciPy or scientific Python questions

Programming Concepts

IDEs



Published

13 February 2015

Category

datascience

Tags