Today I discovered an interesting inconsistency in Activity Streams specs while investigating [a Fedify issue].
-
Today I discovered an interesting inconsistency in Activity Streams specs while investigating a Fedify issue.
The question: How should we interpret URLs like
"icon": "https://example.com/avatar.png"
?JSON-LD context (https://www.w3.org/ns/activitystreams
@type: "@id"
→ “This is an IRI reference, dereference it to fetch an ActivityStreams object.”Activity Streams Primer: “assume that a bare string is the
href
of aLink
object, not anid
” (no dereferencing)Result: JSON-LD processor-based implementations try to parse PNG files as JSON and fail.
Turns out w3c/activitystreams#595 already discusses the same issue for
href
properties. I added a note thaticon
,image
, etc. have the same problem.Once again reminded of how tricky spec work can be…
#ActivityPub #Fedify #ActivityStreams #fedidev #specifications
@hongminhee It's a place where our loosey goosey style goes into nondeterminism. We should tighten it up in the next version. My main answer would be: publishers, don't do that.
-
-
@hongminhee I would assume the same URL can represent both a PNG image and a JSON-LD document.
Here's how I do it in ONI.
The URL https://releases.bruta.link/icon represents the icon for the application actor found at https://releases.bruta.link.
If you fetch it using an Accept header for a json+ld document, that's what you'll get, if you ask it for an image/* document, then you'll get the raw image.
So, from a client point of view, the server returns the raw image, unless asked specifically for a JSON-LD document.
-
@hongminhee I would assume the same URL can represent both a PNG image and a JSON-LD document.
Here's how I do it in ONI.
The URL https://releases.bruta.link/icon represents the icon for the application actor found at https://releases.bruta.link.
If you fetch it using an Accept header for a json+ld document, that's what you'll get, if you ask it for an image/* document, then you'll get the raw image.
So, from a client point of view, the server returns the raw image, unless asked specifically for a JSON-LD document.
@mariusor Indeed the server should not return a PNG file if asked for jSON-LD doc, and instead return a 406 HTTP code.
-
@mariusor Indeed the server should not return a PNG file if asked for jSON-LD doc, and instead return a 406 HTTP code.
@oranadoz you say "indeed" but you end up contradicting me.
Why do you think the server should not return a json-ld document if asked for one?
-
@oranadoz you say "indeed" but you end up contradicting me.
Why do you think the server should not return a json-ld document if asked for one?
@mariusor I meant: I agree that:
- content negociation must be performed
- if asked for JSON-LD, the server returns JSON-LD if available, else return 406
- if asked for image/*, return the PNG.I thought this was what you meant: this is up to the client to ask for what it can handle.
-
@oranadoz you say "indeed" but you end up contradicting me.
Why do you think the server should not return a json-ld document if asked for one?
@mariusor @oranadoz @hongminhee the document describing a resource and the resource itself are not necessarily the same thing. So the response for json-ld for the icon isn't necessarily equivalent to the icon itself.
This has been a long-standing thing in json-ld for ages: is the document describing the resource or is the document the same as the resource.
This is perhaps best described by a document about a person, that's not the same as the person themselves, though that document may be used by that person to describe themselves.
-
@mariusor I meant: I agree that:
- content negociation must be performed
- if asked for JSON-LD, the server returns JSON-LD if available, else return 406
- if asked for image/*, return the PNG.I thought this was what you meant: this is up to the client to ask for what it can handle.
@oranadoz cool, cool. That's indeed what I meant.
-
@mariusor @oranadoz @hongminhee the document describing a resource and the resource itself are not necessarily the same thing. So the response for json-ld for the icon isn't necessarily equivalent to the icon itself.
This has been a long-standing thing in json-ld for ages: is the document describing the resource or is the document the same as the resource.
This is perhaps best described by a document about a person, that's not the same as the person themselves, though that document may be used by that person to describe themselves.
@thisismissem I don't ascribe to the semiotic theory of the web where the map is not the territory.
I like to keep things simple and therefore a json-ld document is a valid representation of an object that can exist as a binary.
People keep forgetting that ActivityPub is meant to be used on top of other web standards like content negotiation.
-
@thisismissem I don't ascribe to the semiotic theory of the web where the map is not the territory.
I like to keep things simple and therefore a json-ld document is a valid representation of an object that can exist as a binary.
People keep forgetting that ActivityPub is meant to be used on top of other web standards like content negotiation.
@mariusor @oranadoz @hongminhee right, but here a description of the icon isn't the same as the binary of the icon itself.
The binary gives you very different data to the description of it, e.g., fetching the binary doesn't indicate where to send replies to or how to interact with it; where as html <-> json-ld generally gives you similar enough representations.
Generally con-neg suggests the same data just in different formats; what you're giving here is different data in different formats.
-
@hongminhee It's a place where our loosey goosey style goes into nondeterminism. We should tighten it up in the next version. My main answer would be: publishers, don't do that.
@evan @hongminhee more and more i am thinking that Link was a bad idea from a data modeling perspective. "assume bare href instead of bare id" is something that can never make sense. if we really want to maintain validity of Link then it should *always* be embedded as an anonymous object:
icon: {
type: Image
url:
{
type: Link
href: foo
height: 400
width: 400
mediaType: image/png
}
}here, Image.url means "representation of the Image"
-
@hongminhee I would assume the same URL can represent both a PNG image and a JSON-LD document.
Here's how I do it in ONI.
The URL https://releases.bruta.link/icon represents the icon for the application actor found at https://releases.bruta.link.
If you fetch it using an Accept header for a json+ld document, that's what you'll get, if you ask it for an image/* document, then you'll get the raw image.
So, from a client point of view, the server returns the raw image, unless asked specifically for a JSON-LD document.
@mariusor @hongminhee > the same URL can represent both
bad idea. an identifier should unambiguously refer to exactly 1 thing
-
@mariusor @oranadoz @hongminhee right, but here a description of the icon isn't the same as the binary of the icon itself.
The binary gives you very different data to the description of it, e.g., fetching the binary doesn't indicate where to send replies to or how to interact with it; where as html <-> json-ld generally gives you similar enough representations.
Generally con-neg suggests the same data just in different formats; what you're giving here is different data in different formats.
@thisismissem @mariusor @oranadoz @hongminhee +1, an image and a descriptor are different things and should be treated as different things. content negotiation is not a solution here -- the same information should be returned for the same resource (modulo whichever representation you ask for or receive).
-
@thisismissem @mariusor @oranadoz @hongminhee +1, an image and a descriptor are different things and should be treated as different things. content negotiation is not a solution here -- the same information should be returned for the same resource (modulo whichever representation you ask for or receive).
@trwnh well, I'll agree to disagree with you.
GoActivityPub has as a first order type representation the json-ld document, which for this specific type (Image, well, others too) can be represented *also* as a binary. So we just do that.
This is simpler, bidirectional in ensuring both the info about a thing, and the thing itself can be reached knowing only *one* piece of information (it's ID/URL), and is supported by long existing HTTP mechanisms, like content-negotiating.
For me pragmatism trumps whatever philosophical reasons people can come up with for it being incorrect. So that's where I'm at.
-
@trwnh well, I'll agree to disagree with you.
GoActivityPub has as a first order type representation the json-ld document, which for this specific type (Image, well, others too) can be represented *also* as a binary. So we just do that.
This is simpler, bidirectional in ensuring both the info about a thing, and the thing itself can be reached knowing only *one* piece of information (it's ID/URL), and is supported by long existing HTTP mechanisms, like content-negotiating.
For me pragmatism trumps whatever philosophical reasons people can come up with for it being incorrect. So that's where I'm at.
@trwnh sorry to be snarky, but you'll probably have a fit when I'll tell you that the on-disk representation for these json-ld documents representing binary stuff, actually hold the binary data as base64 encoded data URLs inside of their content properties. (This is *one* direction in which I went which I kinda regret, and hope to find a better method for storing binaries)
https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/data
-
@trwnh sorry to be snarky, but you'll probably have a fit when I'll tell you that the on-disk representation for these json-ld documents representing binary stuff, actually hold the binary data as base64 encoded data URLs inside of their content properties. (This is *one* direction in which I went which I kinda regret, and hope to find a better method for storing binaries)
https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/data
@trwnh and one final thing.
This insistence of thinking of the underlying data for ActivityPub as separate from it's document representation makes it so the fediverse is as fractured as it is.
ActivityPub deals only with these documents and yet every service, maps whatever data they store, to these imperfect representations which sometimes are very far from the spec, because contorting existing data paradigms into RDF triplets and JSON-LD is cumbersome.
Storing json-ld metadata, or the full document itself, like I do, allows you to think in clearer terms about addressability, access, location, etc..
-
@trwnh and one final thing.
This insistence of thinking of the underlying data for ActivityPub as separate from it's document representation makes it so the fediverse is as fractured as it is.
ActivityPub deals only with these documents and yet every service, maps whatever data they store, to these imperfect representations which sometimes are very far from the spec, because contorting existing data paradigms into RDF triplets and JSON-LD is cumbersome.
Storing json-ld metadata, or the full document itself, like I do, allows you to think in clearer terms about addressability, access, location, etc..
@mariusor @thisismissem @oranadoz @hongminhee i don't think it's cumbersome at all. if people used the as2 data model directly and operated on activities instead of transforming statuses, they wouldn't have that issue (and it is a different issue).
the issue i'm talking about is ambiguity. when you use the same identifier for two different things, you can no longer distinguish between them. this is known as equivocation.
example: does an Image have a width of 800 pixels? no. the repr does.
-
@mariusor @thisismissem @oranadoz @hongminhee i don't think it's cumbersome at all. if people used the as2 data model directly and operated on activities instead of transforming statuses, they wouldn't have that issue (and it is a different issue).
the issue i'm talking about is ambiguity. when you use the same identifier for two different things, you can no longer distinguish between them. this is known as equivocation.
example: does an Image have a width of 800 pixels? no. the repr does.
@mariusor @thisismissem @oranadoz @hongminhee using content negotiation as an example: i can ask for the same Image as either image/png or image/jpg, right?
```
GET /image
Accept: image/png303 See Other
Location: /image.png
```or...
```
GET /image200 OK
Content-Type: image/png
```the Image is the same Image even if i resize it, or convert it to a different format. we are generally uninterested in reasoning about representations instead of reasoning about the thing itself.
-
@mariusor @thisismissem @oranadoz @hongminhee using content negotiation as an example: i can ask for the same Image as either image/png or image/jpg, right?
```
GET /image
Accept: image/png303 See Other
Location: /image.png
```or...
```
GET /image200 OK
Content-Type: image/png
```the Image is the same Image even if i resize it, or convert it to a different format. we are generally uninterested in reasoning about representations instead of reasoning about the thing itself.
@trwnh you seem to be speaking of "a platonic ideal" of the internet.
Tell me which ActivityPub service is capable of giving you png or jpeg versions of an image just because you ask for it. The same for the sizes. Nobody serves you different sized images from the same "resource", because computing that at access time is expensive to do, there's no standard way to ask for a specific size, etc.
While in my case, there is a standard way: content negotiation.
Please understand that you won't convince me. Like I keep saying: pragmatism should trump the philosophy of identity when we program applications.
-
@trwnh you seem to be speaking of "a platonic ideal" of the internet.
Tell me which ActivityPub service is capable of giving you png or jpeg versions of an image just because you ask for it. The same for the sizes. Nobody serves you different sized images from the same "resource", because computing that at access time is expensive to do, there's no standard way to ask for a specific size, etc.
While in my case, there is a standard way: content negotiation.
Please understand that you won't convince me. Like I keep saying: pragmatism should trump the philosophy of identity when we program applications.
@mariusor it's perfectly practical to serve what the requester asked for. it's not very practical to serve something they *didn't* ask for, instead of the thing they asked for.
any http server is capable of this. maybe they use query strings, maybe they don't. there are defaults in any case.
i mean, you probably encounter a cdn serving images like this multiple times every day, without even realizing it.
-
@mariusor it's perfectly practical to serve what the requester asked for. it's not very practical to serve something they *didn't* ask for, instead of the thing they asked for.
any http server is capable of this. maybe they use query strings, maybe they don't. there are defaults in any case.
i mean, you probably encounter a cdn serving images like this multiple times every day, without even realizing it.
@trwnh I'm starting to feel you just like being contrarian.
I just said I serve what requesters ask for because my service employs content-negotiation. So if they ask for an image they get an image and if they ask for a document they get a document.