LBRY spec has implicit max stream length?

While reading through the LBRY data encoding spec, I noticed that there seems to be an implicit max stream length resulting from the fact that the “manifest” blob must adhere to the blob spec, which limits its size to 2MiB. Since the manifest must include a reference to each blob/chunk in the stream, it must grow as the stream grows. These properties conflict in a way that makes it impossible for the spec to encode streams longer than a certain (albeit large) threshold without violating itself.

Since I’m new here and I stumbled onto this fairly quickly, I assume that it is well known and has been talked about extensively (or was quickly disregarded as a self-evident non-issue). However, my searches for those conversations have come up empty. If anyone on this forum can point me in the right direction I’d be grateful. I’m curious to see how this is viewed by other devs with a broader knowledge of LBRY.

Welcome to the forum! This issue is known (at least to me and the other devs I’ve talked to). It hasn’t been a serious issue yet so we haven’t spent much time on it, but there are a few paths forward.

The most straightforward one is to update the structure of the manifest blob to be more efficient. Right now it’s JSON and has a lot of redundant data in it. Instead we should use Protobuf (we’re moving to protobuf for most other parts of the protocol) and remove the redundancies (e.g. per-blob IVs can be generated algorithmicly).

Two more involved changes would be raising the blob max size above 2mb, or to make it so manifests can span multiple blobs. I’d recommend against doing either of these anytime soon, though increasing the max size would be much simpler than doing multi-blob manifests.