Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, nice, since my app is focused on privacy, this could become a huge marketing bonus.

However, the questionaire seems quite complicated. I ask for the email adress and Apple wants to know if I connect the email address with the identity of a user. However, what does that even mean? I have no identity of a user except the email address.



If you only have an email address and you make zero effort to cross-reference that with other data (using, for example, any datasets you purchased, or a marketing data enhancement system) then you are not connected their email address to their identity.

This is discussed here (search for “identity”): https://developer.apple.com/app-store/app-privacy-details/

In contrast, when you provide Facebook an email address, they will explicitly pay a lot of databases to cross-reference your email address and tell Facebook your identity, your salary, and so on.


>If you only have an email address and you make zero effort to cross-reference that with other data (using, for example, any datasets you purchased, or a marketing data enhancement system) then you are not connected their email address to their identity.

That is not my interpretation. As I'm reading it, all data that is routinely collected has to be disclosed, even if it is never cross referenced with any third party datasets.

I think if you create a record on your server for each user (identified by some user ID) and you store the user's email address in that record, then you must disclose that fact.


You have to disclose this, but the problem is the question following this. If you collect the email address, Apple wants to know if you use this to link to the user’s identity. And this is where it’s confusing. Without a definition of "identity", I don’t know if I answer this question properly.


If the same person registers two accounts with two email addresses, but provides the same information for both, would you know that they're the same person?

If the same person registers two accounts with two email addresses, but provides the same mailing address for both, and you send a postal catalog to each of them, would your systems detect the duplication and only send one catalog?

If either Yes, and for some companies it's both Yes, then you are linking their email address to their identity — their personhood, their struct {} of data fields.

If either No, and for many companies it's both No, then you are not linking their email address to their identity.

(Obviously having postal address creates other problems for you, I'm just trying to do my best to analogy here. For definite answers you presumably already have contacted Apple, as Apple is clearly reserving the right to make judgement calls when asked questions about this.)


> If the same person registers two accounts with two email addresses, but provides the same information for both, would you know that they're the same person?

Probably. I find it odd that Apple didn't choose words that are already clear with respect to privacy laws such as GDPR. The GDPR doesn't talk about identity. It defines personal data or personally identifiable information (PII). If you collect this data, you're subject to GDPR compliance.

Apple has a weird phrasing of this. You apparently can collect an email address, but not link it to an identity, which is different from collecting an email address and linking it to an identity. It's unclear to me what they mean by this and what "identity" is supposed to mean.

It's way easier to say: an email address is a piece of data that could identify a person, hence you must treat it carefully and comply with GDPR laws (collect it with consent only, make sure to delete it when you're done, user's right to change PII and user's right to get info about everything they have on you).


I agree with you that "identity" is not well defined in Apple's document.

The way I'm reading it is that "identity" is anything that uniquely identifies each user of your app, i.e. something like a GUID or any generated user ID. It does not necessarily mean that you are able to identify the real-world person behind the user record.

So for instance, if you collect the number of steps each user has taken each day and you store that information on your server associated with a user ID, then you have collected that data and you have linked it to the user's identity, even if you know absolutely nothing else about that user.

What would it mean to collect data without linking it to a user's identity? I think it means collecting aggregate or statistical data. If you transmit the number of steps taken by each user to your server, but you only ever store the average number of steps taken across all your users, then you have collected data without linking it to a user's identity.

For email addresses the distinction between collection and linking to users makes no sense. It's always going to be both or neither.

So that's what I believe. What's important though is what Apple actually means. And I fully agree with you that this document needs clarification.


> when you provide Facebook an email address, they will explicitly pay a lot of databases to cross-reference your email address and tell Facebook your identity, your salary, and so on.

Can you elaborate on this?


Just do a quick search for "DMPs" or "data management platforms."

Multi-billion dollar industry that's focused on collecting data from many different sources, consolidating, and aligning towards real individuals with a combination of deterministic data and probabilistic assumptions.

Then, they can sell access to that database to various companies, mostly in the ad-tech space.

Source: did consulting work for an ad platform in the RTB space on the DSP side, competitor to Google.

EDIT (more context / sidebar thought): this is also why Apple deserves some credit here for their moves, as they are one of the few companies with enough of a war chest to fight against these multi-BILLION dollar interests. It's the type of advantage I worry about losing if Apple has to open up different app stores on the iPhone: if a developer doesn't want to submit documentation and/or get bad publicity for lack of a privacy label, they'll just go to a different, less strict app store.


A search for prior HN discussions matching keywords 'facebook' and 'data' provides many interesting discussions and links to review, and I encourage you to take a look if you're interested to learn more. (If you're already familiar, then with apologies, I won't be engaging in discussion about sentence in this thread. It's possible my summary is imprecise or wrong in some manner; it's presented here only to support answering the question asked, and for that purpose it's enough as-is.)


I'm well-aware of how to use basic search functionality, thanks.

I'm also aware of Facebook's practices, but your comment seemed to be talking about a specific incident or situation -- that's why I asked.


I'm not sure if you have filled out the form, but whether you cross-reference the email address with other data is asked in a separate follow-up question: "Do you or your third-party partners use email addresses for tracking purposes?"

Therefore if you collect email addresses for user account purposes you would probably answer yes to to the first question about whether they are linked to the user's identity, and then no to the second question of whether you use them for tracking purposes.


I would say Apple have made it perfectly clear "You’ll need to identify whether each data type is linked to the user’s identity (via their account, device, or other details) by you and/or your third-party partners.".

In order words :

On side A you have the user's email address OR hashed email address.

On side B, you have other items of identifiable information ("their account", "their device", "other details").

The question is simple. Do you link side A to side B ? Yes or no.


What is "their account"? What is "their device"? How do I get these information from the user? I don't find this simple. If I set up an account with an email address, it's not their identity. It's just an account. They could provide a fake email for all I know.


"What is "their account"? What is "their device"?"

Erm ? Exactly what it says ?

"Their account" is their account on your platform. "Their device" is a device that they own and that you are collecting information about.

As I said, its simple.

Email.

Are you linking it to _ANYTHING_ else ? Yes or no ?

Are you are collecting an email address (or hash of an email), ON ITS OWN and not doing any further processing.... e.g. for a simple mailing list ?

Or are you collecting an email address (or hash of an email) as part of broader set of data you are collecting from the user ? (e.g. email + name + address etc.)

Or are you collecting an email address (or hash of an email) and then sending it off to Facebook or other API in an attempt to build a picture ?


> "Their account" is their account on your platform.

You mean, the Apple account on my platform? How does it get there? I don't have the Apple account of a user. I just have an email address.

> "Their device" is a device that they own and that you are collecting information about.

I don't collect information about their device. But you still didn't answer the question. What is "their device"? Is it a unique device ID? A fingerprint? What is it?

> Are you linking it to _ANYTHING_ else ? Yes or no ?

I use the email to create an account on my backend. Users can backup data to their account. But nothing ties to "identity". They could fake all of it. I don't care. I send transactional emails, such as a reset password email.

> Or are you collecting an email address (or hash of an email) as part of broader set of data you are collecting from the user ?

Not PII data, just stuff they enter in the app.

> Or are you collecting an email address (or hash of an email) and then sending it off to Facebook or other API in an attempt to build a picture ?

No.

But the details of your questions show, imho, that the concept of identity is non-trivial.

I can summarize it very simply: users sign up for an account, identified by an email adress (I don't care if it's real), as a service for the user to have an online backup and an easy way to sync data between devices or move them to Android. The goal is NOT to personally identify a user. But could a user be identified by the email address alone BY SOME ENTITY? Yes, probably. Do I do it? No. Do I share the data or offload it to a third party for data processing? No. Or does my rented self-managed VPS, where the backend runs, count as third party? I don't think so. But of course, I use a transactional email service to send emails to users. What about that? I do have a GDPR-compliant data processing agreement for that. But not sure what Apple wants from me in this case.

Thing is: It's not trivial and Apple's guide is insufficient. That's all I'm saying.

Edit: And to clarify – if it's complicated, data collecting entities such as Facebook could say "well, we did understand this differently" and simply don't tell the truth about the data they're collecting. I guess that is another point in this whole discussion: What if someone lies about their data collection practices? Any consequences? Are there downsides to lying about it?


> I use the email to create an account on my backend. Users can backup data to their account. But nothing ties to "identity". They could fake all of it. I don't care. I send transactional emails, such as a reset password email.

Then you are collecting emails to identify users. It doesn't matter if the email is fake or not.

In other words, you are explicitly connecting a specific device to a specific account on your backend via that email.


> Then you are collecting emails to identify users.

I use it to authenticate users. I don't use it to "identify" users. I give zero fucks about the identity of users.

> In other words, you are explicitly connecting a specific device to a specific account on your backend via that email.

No, because I don't collect any more data about the device, I don't link anything. Users use the app. They can use multiple devices. I know nothing about the devices and don't care. Users themselves link their account to their device (on their device), but I don't get any information about this.

I posted this in another comment, but it makes sense here too:

I find it odd that Apple didn't choose words that are already clear with respect to privacy laws such as GDPR. The GDPR doesn't talk about identity. It defines personal data or personally identifiable information (PII). If you collect this data, you're subject to GDPR compliance.

Apple has a weird phrasing of this. You apparently can collect an email address, but not link it to an identity, which is different from collecting an email address and linking it to an identity. It's unclear to me what they mean by this and what "identity" is supposed to mean.


My understanding is as follows:

If you are able to answer "Is email address[or other collected data] ________ associated the user whose database row has primary key _______?" then that is considered being tied to the account, and thus tied to the identity. If you are using email address as a user id, then it is very much tied to the user's account on your service.

For tying to device, this could be based on reading the devices serial number somehow, using the Id for Advertisers, or just generating a unique random identifier at install time, and use that to distinguish records. So if you can answer "What is the race of the user with random install id _______" then for apples purposes you have tied race to device and thus to identity.

Basically unless anonymized nearly all data that you collect will be considered by Apple to be identity tied, unless you don't have user accounts, and don't include some form of device identifier with the data. It is literally impossible to have any form of user account without at least having one thing "tied to user identity". Even if you use "sign in with apple" and do not collect the anonymized email address they provide, you will have at least "user id" (the "sub" token from apple) and probably also "Other User Content".

For example, an otherwise offline game might collect the time taken to beat each level without any device identifying information to allow the developer to understand if levels were harder than expected. In that case you have collection of "Product Interaction" data that is not identity tied.


It sounds like you're being intentionally obtuse here. The guidelines say:

    Note: “Personal Information” and “Personal Data”, as defined under relevant privacy laws, are considered linked to the user.
This implies that email addresses are, by definition, considered linked to the user's identity. The fact that someone could submit a fake email address is irrelevant, just as if you collected mailing addresses and someone put in "123 Fake Street", or if you collected phone numbers and someone put in "123-456-7890".


I re-read this section. The "Note" makes sense, but the whole description on Apple's website is weird, especially if you consider that Apple allows you to collect an email address that apparently isn't linked to the user. How is this possible? Apple does have categories of data, such as contact info. If contact info is considered "linked to the user", why offer this granularity in the first place? Read this section, fill in "email address" where it talks about data in general and try to make sense of it:

https://developer.apple.com/app-store/app-privacy-details/#l...

Anyways, I wasn't intentionally obtuse here, because I mostly think in terms of GDPR compliance, but you're right. I apparently thought of this in a more complicated way than necessary.


If you store the email in a database and associate it with any other data (password, settings, profile information, etc) I would think it qualifies as an account.


Your app is focused on privacy, yet you can’t answer these simple questions??? What?


My app respects the users privacy. There is no tracking, no analytics, no data sharing with third parties. It’s not a privacy product, if this was unclear.

And yes, part of answering these questions is an honest assessment whether I and Apple agree on what it means to respect a user’s privacy.


Maybe they're looking to get a lot of granularity. Thinking of, https://haveibeenpwned.com/, you could give an email address and get value from the site, but there's no need for an account. Assuming you store the email that's certainly more of a privacy/security risk than using it once and throwing it away.


Possibly. The wording is odd. It assumes that I have information about the identity of a user, but the only thing I have is the email address.

If my app asks for more data, which are backed up as a service to the user (no tracking), that is of course connected to the email address of the account. However, there is no effort from my side to find the actual identity of the user.

This whole identity thing is very confusing. Might need to contact Apple developer support.


The email address is considered information about the identity of a user. It is a very broad concept.

If the user has state or permissions on your site based on that email address, that email address is part of the user's account on your site.


Yeah, they even ask that about User ID. I just answered yes.


This is a huge bonus for app/game developers that have always been privacy focused, now it becomes something the business/marketing teams also want on their app.

What a great day for good developers and consumers to have this as backup to helping preserve people's privacy and stop the blatant selling and abuse of private data. Thank you Apple!


> Apple wants to know if I connect the email address with the identity of a user. However, what does that even mean? I have no identity of a user except the email address.

Identity basically means anything that allows you to uniquely identify a user, so the e-mail address would apply.


Another strange one is "Browsing History: Information about content the user has viewed that is not part of the app, such as websites"

Apple requires apps to have a web site for customer support. Almost every web server logs anonymous data about visited pages. So, almost every apps need to have "Browsing History" checked??


Only if you are ingesting those logs, correlating them with users and storing that data somewhere.


Apple seems to want to show privacy labels even for anonymized data which isn't linked to the user. Here's their App Store app showing category "Data Not Linked to You": https://support.apple.com/en-us/HT211971


That page appears to not show what the data is used for. I wonder if it's shown elsewhere, or why apple asks for that if they don't show it anywhere. As a user, that's certainly information I'd be interested in.

(Documentation lists the following purposes: Third-Party Advertising, Developer’s Advertising or Marketing, Analytics, Product Personalization, App Functionality, Other Purposes)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: