Python
Modules
Sample Project Structure
os
- get current working directory
os.getcwd()
XlsxWriter
- install
$ git clone git@github.com:jmcnamara/XlsxWriter.git
$ cd XlsxWriter
$ sudo python setup.py install
coursera-dl
SimpleHTTPServer
- start server in folder
$ python -m SimpleHTTPServer
IPython / Jupyter
- install
$ sudo dnf install python-devel
$ sudo pip install notebook
- usage
$ jupyter notebook
JupyterLab
reads kernels defined both at ~/.local/share/jupyter/kernels/
and ~/.ipython/kernels/
to start, run the following command
jupyter lab
R Kernel
- install CZMQ High-level C binding for ZeroMQ in Fedora shell
sudo dnf install czmq-devel
in R
install.packages(c('rzmq','repr','IRkernel','IRdisplay'),
repos = c('http://irkernel.github.io/', getOption('repos')),
type = 'source')
IRkernel::installspec()
Scala Kernel
- install
$ cd ~/Downloads
$ wget https://oss.sonatype.org/content/repositories/snapshots/com/github/alexarchambault/jupyter/jupyter-scala-cli_2.11.6/0.2.0-SNAPSHOT/jupyter-scala_2.11.6-0.2.0-SNAPSHOT.tar.xz
$ cd ~/Downloads/jupyter-scala_2.11.6-0.2.0-SNAPSHOT
$ ./bin/jupyter-scala
$ jupyter console --kernel scala211
- github: mattpap: IScala
- github: scala-notebook
- github: tribbloid: ISpark
- github: andypetrella: spark-notebook
- github: hohonuuli: sparknotebook
- Zen of Python
- https://www.python.org/dev/peps/pep-0020/
Django
surveys
Links
Machine Learning
AI: building decision rules 80’s machine learning: learn these from observations 90’s statistical learning: model the noise in the observations big data: many observation, simple rules
hasher.fit-transform
: transform word count list of strings into matrixestimator.partial_fit
- www.wendelin.io: Wendelin Industrial Big Data
- Microsoft Benjamin: benguin@microsoft.com
scikit-learn
ENSAE course material
- sklearn_ensae_course
- clone repository and navigate to “rendered notebooks” folder and execute
ipython notebook
- alternatively, copy link in
http://nbviewer.ipython.org/github/[name]/[repo]
IPython notebooks
Webscraping
Crawling part
Scrapy
Extraction part
Books
- Python for Data Analysis
- Author: McKinney, Wes
Subtitle: Agile Tools for Real-World Data
Publisher: O’Reilly
ISBN: 978-1-449-31979-3
Year: 2013
Tags:NumPy
,pandas
,matplotlib
,IPython
,SciPy
GitHub:git://github.com/pydata/pydata-book.git
- Python for Informatics
- Author: Charles Severance
Subtitle: Exploring Information
Think Series by Allen B. Downey
- Think Bayes
- Subtitle: Bayesian Statistics in Python
Publisher: O’Reilly
ISBN: 978-1-449-37078-7
Year: 2013 - Think Complexity
- Publisher: Green Tea Press
Year: 2012
URL: greenteapress.com/complexity - Think Python (v3)
- Subtitle: How to Think Like a Computer Scientist
Publisher: Green Tea Press
Year: 2008
URL: thinkpython.com - Think Stats (2ed)
- Subtitle: Exploratory Data Analysis in Python
Publisher: Green Tea Press
Year: 2014
URL: thinkstats2.com
Mailing Lists
- pydata: a Google Group list for questions related to Python for data analysis and pandas
- pystatsmodels: for statsmodels or pandas-related questions
- numpy-discussion: for NumPy-related questions
- scipy-user: for general SciPy or scientific Python questions