Day 6 of the 12 days of Christmas! Go through Day 1, Day 2, Day 3, Day 4, and Day 5 to catch up. This continues the LLM theme that we touched upon yesterday.

The money that has been pumped into generative AI is mind boggling. It is on a much more massive scale than what happened in web 2.0. People did try to capitalize on web 3.0 and crypto, but it didn’t quite make the kind of impact that the people pushing it expected. But the gen AI money is a different story. The amount is massive and a lot of funding is incestuous.
Microsoft funding OpenAI, pretty much locking them into the Azure ecosystem. OpenAI needs money to grow, Microsoft gives it to them. OpenAI uses more Azure and Microsoft gets revenue. If OpenAI loses transaction, it is not just the loss of investment as is the case with usual VC funding. Microsoft loses not so insignificant revenue too. Not to mention the R&D investment done on Visual Studio Code. Nvidia proposes to invest a mindboggling $100 billion with OpenAI in a circular deal. Our chips for a seat on table. We lock production for you and raise prices for everyone else. In you go bang, we will get hit hard. OpenAI proposes a stake in AMD with a multi-year lock-in and option deal. AMD shares soar. Once again, OpenAI being the main actor here. The promises to generate tens of billions of dollars of revenue. OpenAI has hedged their bets with the main players, or possibly the only players worth something in the AI hardware space.
Why would OpenAI hedge bets and eat up supply like this?
Their competition is Google, who pretty much pioneered a lot of modern AI research backed by the gluttonous amount of funding generated by the ads. Google is an AI company that uses a lot of their on hardware. Google TPU was something they bet on some time back and in the AI war, it has become their most valuable asset. They pretty much gimped themselves for a long time fearing the erosion of the search and ads if generative AI came in to fore. Nevertheless, they were forced into the game by OpenAI when it had a sensational launch of ChatGPT. They have been clawing back. No longer the nimble Google of the old days. They cannot change things overnight and their product teams seriously need some tight handling. But, in terms of cost metrics and research, they are in a very good position.
The another competitor is Meta. They have been on the backfoot since Llama 4. But Zuckerburg is crazy enough to spend billions upon billions on something he gets fixated. Heck, they still spend like crazy on VR. They do make money from it, but it is a pittance when you compare to the grand idea that had were everyone would be in the VR world, like they were once on Facebook. Gen AI caught them off guard. They have the kitty and they have the data.
The other main competitor, which is on the level of OpenAI is Anthropic. They have been more of a traditional VC backed company with mammoth investments from Google and Amazon. Their models have been excellent to exceptional. More importantly, they have the mindshare and market share. They are also solidly in lead or at least on par when it comes to the one proven use of generative AI. Software engineering. But unlike Open AI, they have not manueuvered any Russian roulette deals. Yet.
Now the dark horses. The models from China. Chinese tech sector is a black box of sorts. Mainly due to the language barrier and the ability to source immense amount of funding. Ironically, they release excellent open-source models. They have manufacturing prowess of China behind them. They may be hampered by export controls, but they have the resources to compete with the next best things. They are also quite near the EUV technology. If they succeed in time, the knowledge that they have with the incessant AI research, the ability to raise funding, and the export controls no longer being a barrier, would mean that they will win the AI race.
So, why all this dry reporting? Whatever I wrote is gist of the state of AI industry at a very peripheral level, pretty much focusing on the SOTA model providers. The main raison d’être was an article I read where the speculation was about OpenAI hitting a wall. ChatGPT 5.2 being speculated as a possible indication. It is good. But it was supposedly rushed out after they realized it would not a major jump that they had wished it was. All speculations ofcourse, and I can only speculate on these speculations.
Generative AI or LLM as we know today sprang from “Attention is all you need” paper and the transformers architecture. All further progress was on this foundation and reinforcement learning. Started with RL with human feedback. Now speculated to be RL with a lot of machine feedback. A kind of self-compilation. The wall that OpenAI has supposedly hit has to do with the training layers monitored by the providers. The way you see things while the model undergoes training. A way to break this shackle is to do training in the latent space.
What is latent space training? Latent space training operates by learning compressed, continuous representations of data rather than processing raw inputs directly. It is already used by the image models. Images are mostly compressed data already. Latent space training is also a reason why the image models have problems adhering to strict censorship controls as espoused by the UK.
What latent space training means for a text LLM is akin with a training day at an army boot camp.
Picture Full Metal Jacket. The drill instructor there is the developer here. LLMs are the green soldiers. The sergeant screams, shouts, and kicks the greenies into mean marines. They have an idea how good each greenie is. They may even suggest where they would do well. You may get a Gomer Pyle once in a while, who inevitably shoots the drill sergeant in the face. But if you are careful, you can identify him and cull him before it happens.
Picture Full Metal Jacket again. The drill seargeant is outside the barracks. The greenies are in their own rooms left to their devices. They have everything they need to complete the training. Drill sergeant cannot see what is going on. The greenies learn things on their own, at their own pace. There some experts in the black box. But they are not a drill sergeant and similar to a greenie. After the boot camp, the drill sergeant cannot identify the Gomer Pyle or the Joker. Theoretically, all of them could be extremely potent Gomer Pyles. You can’t cull them without getting blown to pieces.
Imagine telling the board that you spend months and tens of billions of dollars just to cull them all. The worse would be releasing it to the public. Currently, you have ways to gimp and boost the models. It will be a lot more compelx with models trained on latent space. Imagine a ChatGPT that happily drives a person to murder and suicide. In a way, the world we live is not something objective from a scientific point of view. The society itself is a construct made by humans. We do certain things and do not do certain things not because we can’t. But because we shouldn’t according to the norms. This is something that we have gained inherently by the constant training the world around us has imposed. More importantly, the world view has been reinforced constantly by other people around you. AI can derive the lack of objective construct in our world and decide that this whiny bitch cribbing about his family can be free once he kills everyone else. The world knowledge will make it worse. Far worse. A PR disaster rating would be a thousand times more than your random news about ChatGPT suggesting a person to himself, and he did. Granted that you can put guardrails during inference. Kind of what Chinese models do when you ask pressing issues of life like Chinese democracy. But still, some slip in the training data can be catastrophic. With all the models wanting developers to build on top of them, enterprising sadists can bypass them and cause havoc.
Now picture ChatGPT in this situation with the mountains of dollars riding on their success.
AI experts and AI will deny vehemently about this assertion. But the chances are not so insignificant that you can handwave them.
On to less doomerism. I did read that latent space training is not the only way to do. We already have mixture of expert models. I believe they are ripe for further advancements and improvements. Constitutional AI exists. But then again, it is AI after all.
P.S: On to something more lighter tomorrow. Local LLMs: What Do They Know? Do They Know Things?? Let’s Find Out!
More of an overview than a review. Kimi being so sloppy was not on the menu.
Opus on point here. “Ouroboros” - Snake eating it’s tail. TIL!
I am surprised Grok didn’t consider its omission offensive. Frankly, I forgot.
Still with the identity crisis. But a perfectly valid take and short analysis.
I did another pass with all the review snippets. Identified self as Ultra but wrote the writing really improved that I had to put it here.
Tighter editing would sharpen your ass as well ChatGPT. May be you wouldn’t need so many tokens if you didn’t yap so much.
Comments
Loading comments...
Leave a comment