When I started angel investing in the late 1990s, a tech investment included a significant technology risk, with the potential upside being groundbreaking innovation. Being an investor at this time meant taking a considerable technology risk and betting on actual tech, such as nanotech, semiconductors or biotech.
E-commerce, albeit hyped and interesting, was not considered tech. It was “Business 2.0”, plain and straightforward, hype included.
I disagree. Scaling might seem trivial now, but the state-of-the-art architectures for NLP a decade ago (LSTMs) would not be able to scale to the degree that our current methods can. Designing new architectures to better perform on GPUs (such as Attention and Mamba) is a legitimate advancement. Furthermore, the viability of this level of scaling wasn’t really understood for a while until phenomenon like double descent (in which test error surprisingly goes down, rather than up, after increasing model complexity past a certain degree) were discovered.
Furthermore, lots of advancements were necessary to train deep networks at all. Better optimizers like Adam instead of pure SGD, tricks like residual layers, batch normalization etc. were all necessary to allow scaling even small ConvNets up to work around issues such as vanishing gradients, covariate shift, etc. that tend to appear when naively training deep networks.