Explanation: Python is a programming language. Numpy is a library for python that makes it possible to run large computations much faster than in native python. In order to make that possible, it needs to keep its own set of data types that are different from python’s native datatypes, which means you now have two different bool
types and two different sets of True
and False
. Lovely.
Mypy is a type checker for python (python supports static typing, but doesn’t actually enforce it). Mypy treats numpy’s bool_
and python’s native bool
as incompatible types, leading to the asinine error message above. Mypy is “technically” correct, since they are two completely different classes. But in practice, there is little functional difference between bool
and bool_
. So you have to do dumb workarounds like declaring every bool values as bool | np.bool_
or casting bool_
down to bool
. Ugh. Both numpy and mypy declared this issue a WONTFIX. Lovely.
bool_
via Numpy is its own object, and it’s fundamentally different frombool
in Python (which is itself a subclass ofint
, whereasbool_
is not).They are used similarly, but they’re similar in the same way a fork and a spork can both be used to eat spaghetti.
And do you eat that spaghetti out of a bool?
No i write some spaghetti with a lot of bools
Honestly, after having served on a Very Large Project with Mypy everywhere, I can categorically say that I hate it. Types are great, type checking is great, but applying it to a language designed without types in mind is a recipe for pain.
Adding types on an untyped project is hell. Greenfield stuff is usually pretty smooth sailing as far as I’m concerned…
What years of dynamic typing brainrot does to mf
I currently work on a NodeJS/React project and apparently I’m going to have to start pasting “‘any’ is not an acceptable return or parameter type” into every damned PR because half the crazy kids who started programming in JavaScript don’t seem to get it.
For fucks sake, we have TypeScript for a reason. Use it!
if you have a pipeline running eslint on all your PRs (which you should have!), you can set
no-explicit-any
as an error in your eslint config so it’s impossible to merge code withany
in it+1 if you can have automated checks do part of your reviews for you, it’s a win. I never comment about code style anymore, if I care enough I’ll build it into the lint config
That’s actually a quite bad way of naming types, even if someone really insists on using 32 bit integers for bools for “performance” reasons.
I learned Python as my first programming language, but ever since I got into other languages, I don’t like going back to dynamic typing…
Type checker detecting different types?
Why is this meme still so fuckin funny 😅
Data typing is important. If two types do not have the same in-memory representation but you treat them like they do, you’re inviting a lot of potential bugs and security vulnerabilities to save a few characters.
ETA: The WONTFIX is absolutely the correct response here. This would allow devs to shoot themselves in the foot for no real gain, eliminating the benefit of things like mypy. Type safety is your friend and will keep you from making simple mistakes.
Well yeah just because they kinda mean the same thing it doesn’t mean that they are the same. I can wholly understand why they won’t “fix” your inconvenience.
Unless I’m missing something big here, saying they “kinda mean the same thing” is a hell of an understatement.
They are two different data types with potentially different in-memory representations.
Well, yeah, but they do mean the exact same thing, hopefully: true or false
Although thinking about it, someone above mentioned that the numpy
bool_
is an object, so I guess that is really: true or false or null/NoneIn an abstract sense, they do mean the same things but, in a technical sense, the one most relevant to programming, they do not.
The standard Python
bool
type is a subclass of the integer type. This means that it is stored as either 4 bytes (int32
) or 8 bytes (int64
).The
numpy.bool_
type is something closer to a native C boolean and is stored in 1 byte.So, memory-wise, one could store a
numpy.bool_
in a Pythonbool
but that now leaves 3-7 extra bytes that are unused in the variable. This introduces not just unnecessary memory usage but potential space for malicious data injection or extraction. Now, if one tries to store a Pythonbool
in anumpy.bool_
, if the interpreter or OS don’t throw an error and kill the process, you now have a buffer overflow/illegal memory access problem.What about converting on the fly? Well, that can be done but will come at a performance cost as every function that can accept a
numpy.bool_
now has to perform additional type checking, validation, and conversion on every single function call. That adds up quick when processing data on scales where numpy is called for.
So you have to do dumb workarounds like declaring every
bool
values asbool | np.bool_
or castingbool_
down tobool
.these dumb workarounds prevent you from shooting yourself on the foot and not allowing JS-level shit like
"1" + 2 === "12"
Well, C has implicit casts, and it’s not that weird (although results in some interesting bugs in certain circumstances). Python is also funny from time to time, albeit due to different reasons (e.g.
-5**2
is apparently -25 because of the order of operations)"1" + 2 === "12"
is not unique to JS (sans the requirement for the third equals sign), it’s a common feature of multiple strongly typed languages. imho it’s fine.EDIT: I did some testing:
What it works in:
- JS
- TS
- Java
- C#
- C++
- Kotlin
- Groovy
- Scala
- PowerShell
What produces a number, instead of a string:
- PHP
- SQL
- Perl
- VB
- Lua
What it doesn’t work in:
- R
- C
- Go
- Swift
- Rust
- Python
- Pascal
- Ruby
- Objective C
- Julia
- Fortran
- Ada
- Dart
- D
- Elixir
And MATLAB appears to produce 51, wtf idk
And MATLAB appears to produce 51, wtf idk
The numeric value of the ‘1’ character (the ASCII code / Unicode code point representing the digit) is 49. Add 2 to it and you get 51.
C (and several related languages) will do the same if you evaluate
'1' + 2
.Oh that makes sense. I didn’t consider it might be treated as a char
The JS thing makes perfect sense though,
“1” is a string. You declared its type by using quotes.
myString = "1"
in a dynamically typed language is identical to writingstring myString = "1"
in a statically typed language. You declare it in the symbols used to write it instead of having to manually write outstring
every single time.2 is an integer. You know this because you used neither quotes nor a decimal place surrounding it. This is also explicit.
"1" + 2
, if your interpreter is working correctly, should do the following-
identify the operands from left to right, including their types.
-
note that the very first operand in the list is a
string
type as you explicitly declared it as such by putting it in quotes. -
cast the following operands to
string
if they are not already. -
use the string addition method to add operands together (in this case, this means concatenation).
In the example you provided,
"1" + 2
is equivalent to"1" + "2"
, but you’re making the interpreter do more work.QED:
"1" + 2
should, in fact,=== "12"
, and your lack of ability to handle a language where you declare types by symbols rather than spending extra effort writing the type out as a full english word is your own shortcoming. Learn to declare and handle types in dynamic languages better, don’t blame your own misgivings on the language.Signed, a software engineer.
TypeError is also a correct response, though, and I think many folks would say makes more sense. Is an unnecessary footgun
-
Good meme, bad reasoning. Things like that are why JavaScript is hated. While it looks the same, It should never, and in ANY case be IMPLICITLY turned into another type.
Typing and function call syntax limitations are exactly why I hate JS.
reasoning
What reasoning? I’m not trying to make any logical deductions here, I’m just expressing annoyance at a inevitable, but nevertheless cumbersome outcome of the interaction between numpy and mypy. I like python and I think mypy is a great tool, I wouldn’t be using it otherwise.
This explanation is pretty clear cut
What exactly is your use case for treating
np.bool_
andbool
as interchangeable? Ifnp.bool_
isn’t a subclass ofbool
according to Python itself, then allowing one to be used where the other is expected just seems like it would prevent mypy from noticing bugs that might arise from code that expects abool
but gets annp.bool_
(or vice versa), and can only handle one of those correctly.mpy and numpy are opensource. You could always implement the fix you need yourself ?
They’ve declared it as WONTFIX, so unless you’re suggesting that OP creates a fork of numpy, that’s not going to work.
Well, yes exactly
- Create fixes
- Request merge. assume denied
- Fork numpy and add your changes there
- after just continue to pull new changes over from source of the fork and deal with any merge issues with the fix
Fork numpy
I have a feeling that you’re grossly underestimating the magnitude of this endeavour
Im making no estimation one way or the other
That’s incredibly inconvenient.
Thats what adding strong typing does for you
So many people here explaining why Python works that way, but what’s the reason for numpy to introduce its own boolean? Is the Python boolean somehow insufficient?
here’s a good question answer on this topic
https://stackoverflow.com/questions/18922407/boolean-and-type-checking-in-python-vs-numpy
plus this is kinda the tools doing their jobs.
bool_
exists for whatever reason. its not abool
but functionally equivalent.the static type checker mpy, correctly, states
bool_
andbool
aren’t compatible. in the same way other type different types aren’t compatibleTechnically the Python bool is fine, but it’s part of what makes numpy special. Under the hood numpy uses c type data structures, (can look into cython if you want to learn more).
It’s part of where the speed comes from for numpy, these more optimized c structures, this means if you want to compare things (say an array of booleans to find if any are false) you either need to slow back down and mix back in Python’s frameworks, or as numpy did, keep everything cython, make your own data type, and keep on trucking knowing everything is compatible.
There’s probably more reasons, but that’s the main one I see. If they depend on any specific logic (say treating it as an actual boolean and not letting you adding two True values together and getting an int like you do in base Python) then having their own also ensures that logic.
You know, at some point in my career I thought, it was kind of silly that so many programming languages optimize speed so much.
But I guess, that’s what you get for not doing it. People having to leave your ecosystem behind and spreading across Numpy/Polars, Cython, plain C/Rust and probably others. 🫠
This is the only actual explanation I’ve found for why numpy leverages its own implementation of what is in most languages a primitive data type, or a derivative of an integer.
Someone else points out that Python’s native
bool
is a subtype ofint
, so adding abool
to anint
(or performing other mixed operations) is not an error, which might then go on to cause a hard-to-catch semantic/mathematical error.I am assuming that trying to add a NumPy
bool_
to anint
causes a compilation error at best and a run-time warning, or traceable program crash at worst.
Why use bool when you can use int?
just never #define true 0
I/O Issues are problems that come with the territory for scripting languages like python. Its why I prefer to use bash for scripting instead, because in bash, all I/O are strings. And if there are ever any conflicts, well that’s what awk/sed/Perl are for.
Regex is Turing Complete after all.