Using numpy and pandas in Node.js

Momtchil Momtchev
6 min readApr 2, 2023

If you are an exclusively-Node.js shop and you suddenly need that Python library which has no Node.js equivalent — then I have the solution for you

Photo by Sebastian Dumitru on Unsplash

About 4 years ago, while working on a typical CPU-heavy geometry optimization problem in Python, I decided to give Node.js a try. Until that moment, I was subscribing to the commonly held opinion that Node.js excelled at I/O, while serious number crunching was firmly inside Python’s territory. No matter how hard I tried, I was never able to beat the Node.js implementation in Python. Both CPython and pypy failed behind and failed hard. Alas, the simplicity and elegance of the Python’s interpretators could not compete with the JIT juggernaut that V8 has become.

Now, don’t get me wrong — Python is still the better language. In the short history of technology, JavaScript is but one of the so many examples of an inferior technology winning over a superior one — VHS, x86, MS-DOS — you don’t have to look hard to find dozens of others. Momentum and installed base are very important when it comes to core technologies — and the Allmighty Web has spoken — JavaScript will be the new general-purpose language for the masses. Like it or not.

However, despite Node.js gaining a huge amount of traction during the last few years, there is still one field where it is lagging probably a whole decade behind Python —and that is precisely number crunching. Even if you have never used Python, you have probably heard about scipy, numpy and pandas and you know at least someone who regularly uses them. They are that good and that widely used.

In the Node.js world we have scijs and now @stdlib — however scijs covers about 5% of the numpy features and @stdlib is a gargantuan task that is still a work in progress. Even if it is growing fast, the community behind it is still orders of magnitude smaller than the scipy community.

Now, if you want to do serious number crunching in the browser — there are still no alternatives. @stdlib is the only way to go. However, if you are targeting only Node.js — then Python has a solution for you — its embedded version can be included anywhere. Python can be integrated with any other language and its integration with Node.js — another somewhat similar interpreted language — can be absolutely flawless.

Enough talk, let’s get to the point

import { pymport } from 'pymport/profixied';
const np = pymport('numpy');

const a = np.arange(15).reshape(3, 5);
const b = np.ones([2, 3], { dtype: np.int16 });

Eh? What is the catch? Too good to be true? What about performance? How does it work? Is this a transpiler? What about the binary code? Does it support all Python?

Yes, it supports all Python and there is no catch. And in fact, the explanation is very mundane. Node.js runs the JS code, Python runs the Python code. Python can be loaded as a shared library and has a very good and mature embedding API. And Node.js has a very good addon API. Both languages being somewhat similar, the integration is nearly perfect. It is also remarkably simple: each Python object is proxied in JS — ie there is a corresponding JS Proxy object that holds a reference to it.

pymport simply implements Python references in Node.js. Nothing more, nothing less. JS objects are managed by the V8 GC. Python objects are managed by the Python GC — but are locked in place while JS is using them.

In this example a is a JS object of PyObject type that holds a numpy array. np is a PyObject of a Python module type. Call toJS() on a Python reference, and you will obtain a JS copy of the underlying object:

import { pymport } from './proxified/index.js';
const np = pymport('numpy');
const list = np.arange(15).reshape(3, 5).tolist();
// list is a reference to a Python list
// it can be converted to a JS array
console.log(list.toJS());
// or you can use Python's own str() built-in
// .toString() is wired to it
console.log(list.toString());

Performance is as good as the slowest part — Node.js won’t run slower because you loaded Python — and Python code won’t run slower because it is embedded in Node.js.

There is a certain memory price to pay though. You load and run both interpreters. As Python is numbers of magnitude lighter the Node.js, it is not that much of a problem. The only thing one must be careful with are the proxy objects:

// This creates a single JS proxy object
// a is a reference to a Python numpy object (overhead: 1 reference)
const a = np.arange(65536);
// list is a reference to a Python list object (overhead: 1 reference)
const list = a.tolist();
// refArray is a JavaScript array of references to Python numbers
// The memory overhead is huge - 65536 separate references
const refArray = a.map((x) => x);
// jsArray is a JavaScript array of numbers, we are now fully in JS land
const jsArray = a.toJS();

But numpy makes heavy use of operator overloading that does not exist in JavaScript?

Python has the solution for you, every overloaded function can be called explicitly:

// living simply
const sum = a.__add__(b);
// or a slightly more snobbish approach
const op = pymport('operator');
const sum = op.add(a, b);

But what about functions and exceptions?

Absolutely natural:

const fn = (x, y) => {
return x.__add__(y);
};
try {
const a = np.fromfunction(fn, [2, 3]);
} catch (e) {
console.error(e);
}

And if I already have an existing scijs Node.js application and I want just one feature of numpy that does not exist in scijs?

You are welcome, numpy arrays can be converted to scijs without even copying the backing-store:

// b is a scijs ndarray
const ref = np.frombuffer(PyObject.memoryview(b.data),
{dtype: np.int32}).reshape(b.shape);

And pip3? Where does numpy come from in Node.js?

It comes from pip3 of course. When installing pymport you have two choices: use the built-in Python interpreter that comes with it — if you simply type npm install pymport it will quietly sneak itself into node_modules/pymport/lib/binding. You will have to call it npx pympip3, but it will be otherwise indistinguishable from a normal pip3. I have tried as hard as I could to make this work on all OSes and all environments.

Or — you can opt for the advanced installation. Install Python yourself, then rebuild the pymport C++ code to use your own Python installation. npm install pymport --build-from-source will take you on this route. In this case, it will be up to you to manage this installation and add whatever modules you need to it.

For a more detailed documentation and more examples — including expressing some extreme pandas statements, your starting point should be https://github.com/mmomtchev/pymport/wiki — the pymport wiki.

If you are interested in its internals — or should I dare to ask — want to even contribute — you can find a brief introduction in the wiki.

I am the world’s first unemployed IT engineer — as I am currently embroiled in an absolutely huge judicial/sex scandal that is being kept under wraps because of its political implications in France. I use this time to work on various open-source projects, related to Node.js/V8 internals, geospatial software as well as everything else that comes my way.

--

--