"I hate being pedantic[1] but I keep seeing people make this mistake: JSTOR is not just public domain works, and there is no indication that the downloading was limited to public domain works."
I'd like to tweak the presentation here. Many of the comments I keep seeing (including here on HN) seem to go with the take that
1. That it positively did include works that weren't in the public domain. There doesn't seem to be any indication that this is the case.
2. That Swartz intended to distribute content downloaded from JSTOR not in the public domain. Given the credentials and history of the person in question, it just doesn't seem to be a rational conclusion that Swartz's visions of the future included consciously distributing infringing material (nor the consequences that would go along with it).
1. That it positively did include works that weren't in the public domain. There doesn't seem to be any indication that this is the case.
It's stated in the Wired article that "much of what Swartz is accused of downloading from JSTOR is copyrighted".
It's stated in JSTOR's statement that "The downloaded content included more than 4 million articles, book reviews, and other content from our publisher partners' academic journals and other publications". (Elsewhere the size of the total JSTOR library is said to be over 6 million.)
The indictment states that downloads included "approximately 4.8 million articles, a major portion of the total archive in which JSTOR had invested. Of these, approximately 1.7 million were made available by independent publishers for purchase through JSTOR's Publisher Sales Service."
2. That Swartz intended to distribute content downloaded from JSTOR not in the public domain.
I'll grant that the absence of detail on this claim in the indictment makes it very debatable. However, it's hard to fit any explanation about his intentions to the facts as we know them without making major assumptions about information we don't have.
For instance, many people assume the downloading was for legitimate research. Not unreasonable on the face of it (given, as you say, credentials and history) but then, why does a Harvard research fellow need to covertly access JSTOR from MIT? And not just access, but download two thirds of it? There may be an entirely legitimate answer to that, but if so, it can't be gleaned from the information we currently have available.
> It's stated in the Wired article that "much of what Swartz is accused of downloading from JSTOR is copyrighted".
I'll ignore this for the moment (or at least regard it as "dubious"), and I hope it's obvious why. If we do ignore it, let's see if it's possible for the other two citations you provide here take on a different character in its absence:
> It's stated in JSTOR's statement that "The downloaded content included more than 4 million articles, book reviews, and other content from our publisher partners' academic journals and other publications".
> The indictment states that downloads included "approximately 4.8 million articles, a major portion of the total archive in which JSTOR had invested. Of these, approximately 1.7 million were made available by independent publishers for purchase through JSTOR's Publisher Sales Service."
The claims about partners only means that there were publishers and other organizations supplying content to JSTOR (and were probably selling that content through their own, unrelated channels.) This does not preclude the content in question being entirely public domain content. Given the other characterizations in the indictment, I suspected it of being an instance of deliberately giving an impression of something that doesn't match what really happened while carefully skating on the edge of truthfulness. (I.e., "Yeah, partners were making it available for purchase, but, oh, did we fail to mention that it's public domain anyway?")
A closer reading, though, lends credence to still-in-copyright works being in the mix, if you treat "major portion" as being synonymous with "majority" (used correctly), and even more starkly and convincingly, in light of the numbers you provide.
[In case you can't tell, what I'm really saying is you've got me convinced.]
> However, it's hard to fit any explanation about his intentions to the facts as we know them without making major assumptions about information we don't have.
One such assumption you could make is that he was operating under the modus of grabbing what was possible and sorting it all out later. It doesn't seem to even qualify as a stretch. Given a similar line of thinking of mine in the past for a similar operation (involving databases operated by Gale Group), I certainly don't have any trouble maintaining this assumption for myself (and to myself). There is a detail that gives me pause, though...
> but then, why does a Harvard research fellow need to covertly access JSTOR from MIT
Damned good question.
> And not just access, but download two thirds of it?
Well, is there really any question here? I think it's safe to say that, were it not for a thorny interruption, Swartz was probably aiming for something closer to three thirds of it.
I'd like to tweak the presentation here. Many of the comments I keep seeing (including here on HN) seem to go with the take that
1. That it positively did include works that weren't in the public domain. There doesn't seem to be any indication that this is the case. 2. That Swartz intended to distribute content downloaded from JSTOR not in the public domain. Given the credentials and history of the person in question, it just doesn't seem to be a rational conclusion that Swartz's visions of the future included consciously distributing infringing material (nor the consequences that would go along with it).