Technology Overview

ITEC 4230 Data Science and Analytics Project,
Anca Doloc-Mihu

(License: CC BY-SA 4.0)

Prev - Introduction to Data Science and Analytics (DSA), Next - Agile Development

Tools for pre-processing data

  • R, Python, bash, Matlab, Mathematica
  • Excel, Google Sheets,
  • Spark, Hadoop, MapReduce,

IDE(s)

  • Visual Studio, Eclipse, R Studio,
  • Jupyter Studio, SQL Developer,
  • Sublime, IntelliJ, Pycharm, Notebook++

SQL Databases

  1. SQLite – database in a single file, simplified
  2. MySQL/Oracle/PostGreSQL/MS SQL Server – legacy, centralized, powerful, requires server/ MariaDB/ SQL Developer
  3. Access

NoSQL databases

Simpler, less powerful, cloud options available

  • MongoDB - industry leader
  • Firebase (Google) - easy for mobile apps
  • Cassandra - free and open-source, distributed, wide column store, NoSQL dbms for handling safely large data accross many servers
  • CouchDB - Good for highly distributed systems
  • Dynamo DB – a NoSQL database from AWS
  • TinyDB - a simple database with a clean API that just works without lots of configuration, works with Python 3.5+ and PyPy

Data exploration tools

  • Tableau, PowerBI, Adobe Analytics, MicroStrategy, Google Analytics, Excel,
  • R, Python, Matlab, Mathematica,
  • SAP Analytics,
  • Microsoft Paint
  • Notebooks: Jupyter, Observable.js

Algorithms for advanced data exploration

  • Sorting algorithms – merge sort, quicksort, divide and conquer
  • Search algorithms: linear, binary search
  • Clustering: K means, nearest neighbor,
  • Classification: linear regression, supervised/non supervised learning, Naïve Bayes, SVM, extremely random forests, binary classification,
  • Pre-processing algorithms: MapReduce, Principal Component Analysis (PCA)
  • Prediction algorithms: random forest, generalized linear model (GLM), regression algorithms - linear, logistic,
  • Time series: ARIMA - Autoregressive Integrated Moving Average, and versions - 11 Time Series Forecasting Methods in Python cheat sheet
  • Neural networks (NN): Deep NN, perceptron, ANN

Tools for visualizations

  • Python libraries: matplotlib, Seaborn
  • R libraries:
  • Matlab: scatterplot, plot, plot3
  • Excel
  • other…

Tools for dynamic visualizations

  • D3.js, Chart.js, Chartlist.js, Highcharts, FusionCharts, Timeline.js
  • Google Charts, dygraphs
  • Python – interactive methods – plotly,
  • R – interactive methods – plotly,
  • Tableau, PowerBI, Datawrapper, Raw, Infogram - paid but some free stuff available
  • for maps: Leaflet, OpenLayers, Kartograph, CARTO

Major players; require Node.js on the server

  1. Angular (Typescript) - a full framework, supported by Google
  2. React (JSX) - only a library. Invented by Facebook, but open source now
  3. Vue.JS (Javascript) - modeled by original Angular, independent and open source

Non-JS web frameworks

  • Python
    • Django - Very extensive, open-source framework
    • Flask - Minimal, light, open-source framework
  • C#
    • .Net Core - Controlled by Microsoft, but platform independent
  • Ruby on Rails - Open-source Ruby web framework, lost some steam recently
  • PHP (legacy) – but good with new frameworks
    • Laravel - easy, video tutorials
    • Yii - performant, but steep learning curve
  • Java
    • Java Server Faces (JSF) - Legacy
    • Java Spring - similar to .Net, controlled by Oracle

Mobile hybrid frameworks

Hybrid: cross-platform (Android, iOS, …)

  • Flutter (Dart)
  • React Native (JSX, HTML, native)
  • Ionic (Angular/React, HTML) – based on Cordova
  • Xamarin (C#, XML, allows native code)

Game Dev/Graphical

  • Unity (C#/Javascript)
  • Unreal Engine (C++/Blueprints)
  • Godot (custom, Python-like language)
  • GameMaker (custom language)
  • Processing (Java)
  • JS frameworks (PhaserJS, …)

Online Servers

There are free or low-cost options

  • Heroku - Completely free servers available, but can’t host database for free
  • Netlify/Github Pages - Free hosting of static web content
  • Amazon Web Services (AWS) - 12-month free tier
  • Microsoft Azure - free tier
  • Altervista - Free PHP and MySQL server
Home