WORKBENCHES & IDES:
JupyterLab, VS Code Server, RStudio Server (opt), Terminal (bash/zsh/fish), TensorBoard, MLflow UI, Gradio, Streamlit, Dash, FastAPI playground, OpenAPI/Swagger UIs, Postman, Bruno (opt)
PROGRAMMING LANGUAGES & RUNTIMES:
Python 3.x, Node.js, TypeScript, R (opt), Java (JDK), Go (opt), Julia (opt), C/C++, CUDA toolkits, ROCm (where available), GCC/Clang, CMake, Ninja
PACKAGE & BUILD TOOLING:
conda, mamba, pip, uv, Poetry, virtualenv, pip-tools, build, twine, pre-commit, black, isort, ruff, flake8, mypy, pytest, tox, coverage, git, git-lfs
FILES, STORAGE, WAREHOUSES, QUEUES:
Amazon S3 (API), Google Cloud Storage (API), Azure Blob (API), MinIO, HDFS (opt), WebDAV
PostgreSQL, MySQL/MariaDB, SQLite, MongoDB, Cassandra, Redis, Elasticsearch, OpenSearch, ClickHouse, Snowflake (API), BigQuery (API), Redshift (API), Databricks (API), Trino/Presto (API)
Kafka, Pulsar, RabbitMQ, NATS
Parquet, Arrow, ORC, Avro, JSON/NDJSON, CSV, Feather, Delta Lake (connector), Iceberg (connector), Hudi (connector)
DATA ENGINEERING / ETL / ORCHESTRATION:
Pandas, Polars, PyArrow, DuckDB, Dask, Apache Spark, Ray Data, dbt (core), Airflow, Prefect, Dagster, Flyte, Great Expectations, Soda Core, Airbyte (connectors), Singer taps, Apache NiFi (opt), Pandera
CORE ML/DL FRAMEWORKS:
PyTorch, TensorFlow, Keras, JAX/Flax, scikit-learn, RAPIDS (cuDF, cuML), XGBoost, LightGBM, CatBoost, ONNX, ONNX Runtime, OpenVINO, TensorRT, TVM, TorchMetrics, timm
HPO / AUTOTUNE / AUTOML:
Optuna, Ray Tune, Hyperopt, Ax, SMAC3/BOHB, KerasTuner, Nevergrad, AutoGluon, H2O AutoML, auto-sklearn, TPOT
COMPUTER VISION:
OpenCV, torchvision, Albumentations, imgaug, Ultralytics YOLO (v5/v8), YOLOX, Detectron2, MMDetection, MMSegmentation, DETR, Segment Anything (SAM), GroundingDINO, OpenMMLab stack, MiDaS (depth), ESRGAN (super-res), EasyOCR, PaddleOCR, Tesseract
NLP / LLM TOOLING:
Hugging Face Transformers, tokenizers, datasets, accelerate, PEFT/LoRA/QLoRA, SentenceTransformers, spaCy, NLTK, Gensim, Stanza, Flair (opt), fastText, OpenNMT, SentencePiece
LLM ORCHESTRATION & GUARDRAILS:
LangChain, LlamaIndex, Haystack, DSPy, Guidance, Instructor, semantic-kernel (opt), NeMo Guardrails, Guardrails-AI, Rebuff, Llama Guard, LangFuse (API), TruLens (API), Arize Phoenix (API), LangSmith (API)
LLM SERVING & OPTIMIZED BACKENDS:
vLLM, Text Generation Inference (TGI), llama.cpp, ggml/gguf, Ollama, FasterTransformer (opt)
RAG & VECTOR DATABASES:
FAISS, Annoy, HNSWlib, Chroma, Qdrant, Milvus, Weaviate, Pinecone (API), Redis (vector), Elasticsearch/OpenSearch (kNN), Vespa, pgvector, LanceDB
DOC INGEST, OCR & EVAL (RAG PIPELINE):
unstructured, pypdf, pdfplumber, Apache Tika, textract, Tesseract, PaddleOCR, Ragas, DeepEval (opt)
EMBEDDINGS (TEMPLATES/APIs):
all-MiniLM-L6-v2, bge-large/bge-small family, E5-Large, GTE-Large, MPNet variants, OpenAI text-embedding-3 (API), Cohere embed (API), Voyage (API)
GENERATIVE AI — IMAGE/TEXT/AUDIO/VIDEO:
Diffusers, Stable Diffusion 1.5/2/XL, ControlNet, IP-Adapter, LoRA/PEFT training, DreamBooth, Textual Inversion, ComfyUI (opt), Automatic1111 (connector)
Multimodal: CLIP, BLIP/BLIP-2, LLaVA (templates)
Audio/Music: torchaudio, librosa, Demucs, Bark, MusicGen, Riffusion
Speech: OpenAI Whisper & faster-whisper, wav2vec2, Vosk, SpeechBrain, Coqui TTS
TIME SERIES & FORECASTING:
Prophet, statsmodels, darts, GluonTS, Kats, Orbit, Nixtla (NeuralForecast, StatsForecast)
REINFORCEMENT LEARNING & SIMULATION:
Stable-Baselines3, RLlib, CleanRL, Gymnasium, PettingZoo, Brax (JAX), Isaac Gym (where licensed)
GRAPH ML:
NetworkX, PyTorch Geometric (PyG), Deep Graph Library (DGL), StellarGraph (opt), Neo4j driver (connector)
SERVING / API / DEPLOYMENT:
FastAPI, Flask, gRPC, Uvicorn/Gunicorn, TorchServe, Triton Inference Server, KServe, Seldon Core, BentoML, ONNX Runtime Server, Ray Serve, NVIDIA TensorRT-LLM, Celery, RQ
MLOPS / TRACKING / OBSERVABILITY:
MLflow, Weights & Biases, ClearML, Neptune, Aim, DVC + Git LFS, MLflow Model Registry, W\&B Artifacts Data/Model monitoring: Evidently AI, Great Expectations, Deepchecks, WhyLabs (API)
Ops/metrics/logs: Prometheus, Grafana, OpenTelemetry (opt), Loki/Elastic (opt)
CLOUD SERVICES & SDKs (CONNECTORS):
AWS CLI, boto3, Amazon SageMaker SDK, Amazon Bedrock SDK, S3/STS, Redshift (API), EMR connectors. Azure CLI, Azure ML/AI SDK, Azure OpenAI, Cognitive Services, Azure Storage
gcloud CLI, google-cloud-python, Vertex AI SDK, BigQuery, GCS. NVIDIA NGC CLI, NIM microservices (where licensed). Hugging Face Hub, OpenAI, Anthropic, Cohere, Mistral, Together, Replicate, Fireworks (APIs), Databricks (API), Snowflake (API), IBM watsonx (API)
WAREHOUSES / DATABASES
Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, MySQL/MariaDB, SQLite, MongoDB, Cassandra, ClickHouse, Elastic, OpenSearch, Redis
SECURITY, SUPPLY CHAIN & SECRETS:
Trivy, Grype, Syft (SBOM), Cosign (opt), Bandit, Safety, pip-audit, HashiCorp Vault, AWS Secrets Manager (API), Azure Key Vault (API), GCP Secret Manager (API)
CONTAINERS / DEVOPS / KUBERNETES:
Docker, Podman, Kubernetes, kubectl, Helm, Kustomize, Docker Compose, Kind (local K8s demos), Argo Workflows (opt), Kubeflow (opt), Make, GNU Parallel, GitHub Actions/GitLab CI templates
LABELING & DATA CURATION:
Label Studio, CVAT, doccano, Prodigy (API/licensed), Roboflow (API), FiftyOne
VISUALIZATION & BI:
Matplotlib, Seaborn, Plotly, Altair, Bokeh, HoloViews, Apache Superset, Metabase, Grafana, Kibana (opt)
MEDIA & GEOSPATIAL UTILITIES:
FFmpeg, ImageMagick, exiftool, GeoPandas, Shapely, Fiona, Rasterio, Folium, kepler.gl (via notebooks)
DATASETS — LOCAL SAMPLES + OFFICIAL CONNECTORS:
Vision: MNIST, Fashion-MNIST, CIFAR-10/100, STL-10, Caltech-101/256 (scripts), COCO, Pascal VOC, OpenImages, ADE20K, Cityscapes, KITTI, WIDER FACE, LFW, VGGFace2 (license-aware), ImageNet (connector) NLP: WikiText-103, SQuAD, GLUE/SuperGLUE, IMDB, AG News, Yelp, Quora Duplicate, SNLI/MultiNLI, C4, The Pile (subsets), Wikipedia snapshots, Common Crawl pipelines (scripts) Audio/Speech: LibriSpeech, Common Voice, VoxCeleb, ESC-50, UrbanSound8K
Video/Multimodal: Kinetics, UCF101, HMDB51, MSRVTT, WebVid-2M (connector) Embeddings/Eval: MTEB tasks (connector), LAMBADA, BIG-bench (connector) Kaggle: kaggle CLI wired for datasets/competitions (TOS), Hugging Face Datasets (connectors)
EDGE / MOBILE / ON-DEVICE:
ONNX Runtime, TensorRT, OpenVINO, TFLite, Core ML Tools, Jetson toolchains, GGUF quantized LLMs, MediaPipe (opt)
COPILOTS & ASSISTANTS:
AINA Chatbot (built-in), OpenAI Assistants (API), Claude (API), Cohere (API), Mistral (API), LangGraph templates, Function-calling/Tools templates
CLASSROOM & COLLABORATION:
Shared notebooks/projects, environment pinning, read-only lab baselines, cohort workspaces, artifact storage, exportable MLflow runs, demo links
UTILITIES & SYSTEM TOOLS:
tmux, htop/nvtop, wget, curl, httpie, ripgrep, fd, jq, yq, rsync, rclone, cron, ssh, ssm (connectors), make, unzip/7z, tree
EXPLAINABILITY & RESPONSIBLE AI (XAI):
SHAP, LIME, Captum, ELI5, InterpretML, Alibi Detect, Fairlearn, AIF360, What-If Tool, DiCE, Responsible AI Toolbox
PRIVACY, FEDERATED & SECURE ML:
Opacus (PyTorch DP), TensorFlow Privacy, PySyft, Flower (FL), TensorFlow Federated, NVIDIA FLARE (API/connector), Crypten, TenSEAL (FHE), Pyfhel (FHE), OpenDP SmartNoise, Presidio (PII), AnonymizeDF, ARX (connector)
FEATURE STORES:
Feast, Vertex Feature Store (API), SageMaker Feature Store (API), Databricks Feature Store (API), Hopsworks (API), Tecton (API)
AGENT FRAMEWORKS & AUTOMATION:
LangGraph, CrewAI, AutoGen (Microsoft), Semantic Kernel, Haystack Agents, SuperAGI (opt)
PROMPT EVAL, RED-TEAMING & GUARDRAILS:
promptfoo, garak, Microsoft PyRIT, NeMo Guardrails, Llama Guard, Rebuff, OpenAI Evals (API), lm-eval-harness (EleutherAI), Langfuse (API), TruLens (API), Phoenix (API), Lakera Guard (API)
MODEL COMPRESSION, PRUNING & DISTILLATION:
Intel Neural Compressor, OpenVINO NNCF, torch-pruning, nn-pruning, knowledge-distillation templates (DPO/ORPO/LoRA/QLoRA), DeepSpeed-Chat, TRL (HF)
DISTRIBUTED & HPC EXTRAS:
Horovod, FSDP (PyTorch), Megatron-LM (templates), Ray Train, Lightning Fabric/Strategy, NCCL tools, Nsight Systems/Compute (env dependent)
SCIENTIFIC & MATH STACK (CORE DS):
NumPy, SciPy, SymPy, Numba, CuPy, Modin (pandas at scale), Joblib, Multiprocessing, Polars (already listed)
SCRAPING, CRAWLING & DOC INGEST EXTRAS:
Scrapy, BeautifulSoup4, lxml, trafilatura, newspaper3k, goose3, Readability, Playwright, Selenium, pyppeteer/puppeteer
NOTEBOOK ECOSYSTEM & AUTOMATION:
IPyWidgets, Jupytext, nbconvert, papermill, Voilà, Panel, Markdown/Quarto (opt)
REPORTING & DOCS GENERATION:
Pandoc, ReportLab, WeasyPrint, wkhtmltopdf (connector), mkdocs, Sphinx, Mermaid (diagrams)
EDA & DATA PROFILING:
ydata-profiling (pandas-profiling), Sweetviz, Lux (opt), Dataprep EDA (opt)
DATA QUALITY, LINEAGE & CATALOG:
Great Expectations (already), Soda Core (already), Deequ (Spark), OpenLineage (connector), Marquez (connector), DataHub (connector), Amundsen (connector)
GEOSPATIAL & MAPS (EXTRA):
GDAL/OGR, PROJ, PostGIS (connector), OSMnx, Folium (already), kepler.gl (notebooks), pydeck
RECOMMENDER SYSTEMS (EXTRA):
LightFM, Spotlight, TensorRec, NVIDIA Merlin (API/connector), Cornac, implicit (already)
SEARCH & LIGHTWEIGHT INDEXES:
Meilisearch, Typesense, Vespa (already), Elasticsearch/OpenSearch (already)
SCHEDULING & EVENTING (APP LEVEL):
APScheduler, Celery + Flower, RQ, Temporal (connector), CRON templates
LOGGING, TRACING & ERROR MONITORING:
Sentry SDK (API), OpenTelemetry (already), structlog, loguru, ML debugger: debugpy, cProfile/py-spy/line-profiler/memory-profiler, torch.profiler
BROWSER/UI TEST & LOAD:
Cypress (connector), k6 (connector), Locust (opt), Lighthouse CI (connector)
DATA ANON/SYNTHETIC DATA:
Faker, SDV (Synthetic Data Vault), ydata-synthetic, Gretel (API/connector), synthcity (opt)
MEDIA & FILE TOOLING:
pdfminer.six, pikepdf, PyMuPDF, ExifRead, Pillow/PIL, Wand (ImageMagick bindings), ffmpeg-python
BIG DATA & STREAMING:
Apache Flink (connector), Spark Structured Streaming, Kafka Streams clients, Delta Lake/Iceberg/Hudi connectors (already listed)
UI FRAMEWORKS (PYTHON APP FRONTS):
Shiny for Python (opt), Flet (opt), NiceGUI (opt), Textual/Rich (TUI)
GRAPH DATABASES (CONNECTORS):
Neo4j (py2neo/official driver), TigerGraph (connector), ArangoDB (connector)
DATA VALIDATION (SCHEMA & TYPES):
Pandera, Pydantic, Cerberus, Voluptuous
MOBILE & EDGE (EXTRA):
NCNN (opt), MNN (opt), Core ML Tools (already), MediaPipe (already), TFLite Micro (opt)
SEARCH/APIS FOR LLM CONTENT SAFETY:
OpenAI Moderation (API), AWS Comprehend (API), Google Perspective (API), Azure Content Safety (API)
BENCHMARKS & METRICS (LLM & CLASSICAL):
HF Evaluate, scikit-learn metrics suite, torchmetrics (already), MTEB (connector), GLUE/SuperGLUE (already)
KAGGLE & HF OPS (EXTRA):
kaggle CLI (already), HF Hub CLI (already), Datasets lfs cache & snapshot scripts, Spaces deployment templates (connector)
INFRA AS CODE (OPTIONAL FOR DEVOPS LABS):
Terraform (opt), Pulumi (opt), Ansible (opt)
TEAM/PRODUCTIVITY UTILITIES:
make, just (task runner) (opt), tmux (already), ripgrep/fd (already), jq/yq (already), direnv (opt)
RCAI MODULE ALIGNMENT (RUNS INSIDE AINA):
Modules 00–41: Python Programming (05), Math Foundations (06), Machine Learning (07), Train/Deploy (08), Kaggle Datasets (09), AI Frameworks (11), Object Detection (12), OpenAI & ChatGPT (13), AI in Cybersecurity (14), LLMs (15), Labs (16), Generative Art (17), Facial Recognition (18), Deep Learning Essentials (19), Data Science (20), Azure AI (21), Google Vertex AI (22), Amazon SageMaker (23), Applications (24–37), Deep Learning Tech Lectures (38), Big Data (39), Classroom Project (40), Microsoft ML Tools (41) Modules 13 (OpenAI/ChatGPT), 15 (LLMs), 16 (AI Labs), 24–37 (Applications: personal, sales, coding, images, audio, video), 30 (Personal Chatbots), plus cloud modules 21–23 (Azure, Vertex, SageMaker) for hosted agents.
AGENTIC FRAMEWORKS (CORE)
LangGraph, CrewAI, AutoGen (Microsoft), Semantic Kernel, Haystack Agents, Transformers Agents, LlamaIndex Agents, LangChain Agents, SuperAGI (opt)
ORCHESTRATION & STATE MACHINES
Graph/ DAG agents (LangGraph), router & tool-calling agents (LangChain/LlamaIndex), conversational planners, multi-agent “crews”, role & tool routing, function-schema tooling (JSON Schema/OpenAPI tool specs)
TOOL-USE / FUNCTION BRIDGES
OpenAI Assistants (API), Anthropic Tools (API), Mistral Tools (API), Cohere Tools (API), function-calling wrappers for FastAPI/HTTP, Python REPL tool, Shell (sandboxed), file I/O tools, vector-search tools, SQL tools
PLANNING & REASONING PATTERNS
ReAct, Reflexion, Tree-of-Thought (ToT), Graph-of-Thought, MRKL, Toolformer-style patterns, routing & fallback strategies, self-consistency, debate/consensus agents
MEMORY & KNOWLEDGE
FAISS, Qdrant, Milvus, Weaviate, Chroma, Redis (vector), pgvector; document loaders (unstructured, pypdf, pdfplumber, Tika, OCR) and long-term memory stores via LlamaIndex/LangChain
WEB BROWSING / AUTOMATION
Playwright, Selenium, browser-use tools, requests/HTTP clients, search connectors (Tavily (API), SerpAPI (API), Wikipedia), scraping (Scrapy, BeautifulSoup, trafilatura, newspaper3k)
EXECUTION SANDBOXES
Python notebook kernel, Python REPL tool, code-execution cells, task runners, safe temp workspaces, FFmpeg/ImageMagick utilities for media tasks
EVAL, TELEMETRY & GUARDRAILS
promptfoo, garak, Microsoft PyRIT, NeMo Guardrails, Llama Guard, Rebuff, TruLens (API), Langfuse (API), Phoenix (API), OpenTelemetry, policy prompts, allow/deny lists, PII scrubbing (Presidio), jailbreak/TOFU checks
WORKFLOWS & SCHEDULING
Airflow, Prefect, Dagster, APScheduler; event triggers to run agents on schedules or data arrivals
PREBUILT AINA AGENT TEMPLATES
Research & Cite Agent (web browse + sources). PDF/Data Analyst Agent (RAG over docs). Dev/Ops Remediation Agent (log triage → fix suggestion). Support Triage Agent (classification → reply draft). ETL Orchestrator Agent (ingest → clean → vectorize → index). Creative Studio Agent (prompt → image/music/video pipelines)
DATASETS — LOCAL SAMPLES + OFFICIAL CONNECTORS:
Hugging Face Datasets (datasets library) with streaming, map/filter, parquet/arrow caching; MTEB & GLUE/SuperGLUE via HF; Wikipedia/Common Crawl pipelines (scripts); Kaggle CLI (TOS)
LLM SERVING & OPTIMIZED BACKENDS:
vLLM, TGI (Hugging Face Text Generation Inference), llama.cpp/gguf, Ollama; Optimum Runtime integrations (ONNX Runtime, OpenVINO, TensorRT)
PHISHING (emails / URLs / webpages)
Phishing Email Dataset (Kaggle) — cleaned corpus of phishing vs legitimate emails; great for NLP classification, intent/IOC extraction, and transformer-based models. Load via Kaggle CLI or Hugging Face mirrors.
RANSOMWARE / MALWARE (behavioral, static, telemetry)
Ransomware PE Feature Sets (Kaggle) — static PE-file feature vectors (headers, imports, entropy, strings) for safe ML without executing binaries. Good for rapid prototyping of static-analysis classifiers. Network & Host Telemetry Collections (CIC-IDS / custom corpora) — includes benign vs malicious network flows and host logs you can use to simulate ransomware lateral movement or exfil patterns.