Ask HN: How do you handle user-generated content in your apps? | Better HN

87 comments

64 comments · 16 top-level

TekMol4y ago· 12 in thread

My "pain point" is a bit earlier in the process:

Auser authentification.

What is everyone using for this? How do you turn a static website where a user can set some configurations (say the color scheme) into a site where the user can log in and save their settings?

In the past I rolled my own solutions. But for new projects I am considering to use a library or framework.

I guess Django, Flask, Laravel, Symfony and Express all come with some default auth mechanism. How is HNs experience? Are you using these? Are you happy with them?

PaulHoule4y ago

It was funny that I built a "user management" framework that was open source in 2001 and nobody cared except for crackers.

danpalmer4y ago

We use Django's authentication, authorisation, and session management and it's rock solid. I can highly recommend it.

TekMol4y ago

Did you have to create all the usual routes and views for the typical auth flow like signup, login, logout, change-password etc yourself?

I did a quick test like this:

    apt install -y python3-django
    django-admin startproject mysite
    cd mysite
    python3 manage.py migrate
    python3 manage.py runserver 0.0.0.0:80

And it serves a Django site but there seem to be no routes for users to sign up, log in etc.

The urls.py file contains only this one route it seems:

    urlpatterns = [
        url(r'^admin/', admin.site.urls),
    ]

viewOP4y ago

I love using Firebase Auth: https://firebase.google.com/docs/auth

TekMol4y ago

I looked through the docs and have some issues with it.

First of all, it is a service. That opens a can of worms on its own. The main issue being that you are 100% at the mercy of the service provider. Every time they decide to change their API, it adds maintenance work to your project. Sometimes it happens on short notice which can be very annoying.

Second, it is a strange mix of service and code. I don't see an easy way to use it without their Javascript SDK. And installing that SDK looks complicated from what I see.

steve_adams_864y ago

My team tried Firebase Auth and while I think it's stellar for some use cases, I'd warn people about some potential issues:

- Firebase is a juggernaut of a library on the frontend: https://bundlephobia.com/package/firebase@8.10.0 Our product's main feature is being fast - it has the word "instant" in the name. For a product with users around the world, many of them will feel that network request and/or increased execution time. This is fine at the prototype stage or for teams which don't have the resources to implement a case-specific auth implementation that'll be lightweight and efficient. In our case we felt kind of stupid; this should have been clear to us from the beginning, but we wanted to move quickly. Ultimately this cost us time, and to be frank, that was mostly on me!

- Integration with other systems wasn't always smooth or simple. The Firebase Admin documentation left a lot to be desired. There are a lot of quirks all over. Some fields during some authentications might be empty for example, but it wasn't clear why or what this meant - this meant a lot of deep-diving and experimentation. We were using the official Go library. Sometimes we could use the library, other times we'd need to write a request out to Google's APIs. We made a lot of passes to improve on this thinking we must be missing something, but after hitting various existing issues online where developers dismissed the problem, it became clear that this is just life for a Go server supporting Firebase Admin.

I do recall that Android and Node.js support appeared much better, so if you suspect you're using a better-supported ecosystem, maybe this won't get in your way.

- Something that a better developer might understand and navigate than I did was the lack of assurance of data being present or its structure being consistent. Fields coming back for users seemed slightly inconsistent (not a problem for most requests). I wrote a parser for each provider to normalize data because occasionally we'd be missing the user's email or something. For example, as I recall, getting the email from a Twitter-authed user could be different from getting the email for a Google-authed user. I'll admit we had other issues to face so I spent the least time on this, then we dropped Firebase before I could revisit.

- Like any off-the-shelf solution, it ends up having significant limitations. This can be a great thing too, but for us it was a deal breaker. You can assign metadata to auth profiles but this felt too flimsy to us. I think this is a general Firebase issue, not specific only to auth: data integrity is poor. It was an ever-present problem that we couldn't attach users to our other data with rock solid guarantees. It felt like auth was almost ephemeral in our stack, and without fully owning it, it was as though our users weren't the cornerstone of our application but a floating member out in space.

Despite all of that it's a great product and I highly recommend it if it fits your needs. I'm not slamming the developers behind it. I think they're well aware that it's not for everyone, and they've done a great job making it work for as many people as it does.

Natfan4y ago

I'd go with Supabase.

TekMol4y ago

Thanks. I have been reading through the docs for a while now but I am not sure what to make of it.

What would be the minimum number of commands I have to type into a command line (Say in a fresh Debian install) to get a simple site up and running that lets users do the basic stuff like sign up, log in, log out, change password, delete profile?

You should look into Keycloak. It is an open source identity and access manager.

TekMol4y ago

Thanks. For how long have you been using it?

You shouldn't be rolling your own authentication. That's just asking for trouble.

I don't have experience with those frameworks, but the rails equivalent, devise, is very good. I assume it's the same for any mature framework.

ipaddr4y ago

Rolling his own doesn't mean custom algos. It means he will use the buildin language functions and/or external packages for that functionality while manually piping all of the pieces together.

He is not asking for trouble..

al2o3cr4y ago· 9 in thread

    an MVP for a web service called View

Nitpick: if someone types in "user-generated content view" to their favorite search engine, they're not likely to find you.

Causality14y ago

One has to admit the modern trope of naming everything single generic words is growing increasingly tiresome. The only thing keeping every developer conversation from sounding like complete lunacy is careful capitalization.

viewOP4y ago

Yeah, I guess you're right. Do you have experience with SEO? What would be the best way to improve search ranking?

I suppose running a blog with good content might be helpful so that someone might find the service using a search term like "video streaming api" or similar.

finiteseries4y ago

How the hell was “view” available as an HN username til 47 days ago!?

On topic, you could try attaching a domain specific word to the end until you’re big enough to take over the generic one, ala Flock Freight, Cured Health, Glow Credit etc. Lot of those lately.

jordanwallwork4y ago

An alternate spelling could help with this; how about 'Vue'? \s

Causality14y ago

Not 21st century enough, it needs a deleted E followed by an R. I propose Viewr.

stevekemp4y ago

Let's go retro, with an "i-prefix". So "iVue".

Oh you mean the React ”killer” Vue?

dredmorbius4y ago

Numerous namespace conflicts.

antsar4y ago

Vuew.

codingclaws4y ago· 7 in thread

FYI - the hardest part of user generated media is probably the moderation aspect.

doh4y ago

That's why we built our Attribution Engine [0]. We help platforms to deal with CSAM, IBSA and any toxic content and copyright.

The way it works is that platform uses our SDK through which they "send" all uploaded content. The SDK generates a fingerprint that is sent to our service and a license is issued. The license is non/permissible, so they instruct platform and the uploader (creator) what to do. We also provide payment distribution, usage reporting and ADR (alternative dispute resolution).

If platform uses our service, we indemnify them from any liabilities under DMCA, EUCD and others up to $50M in damages (legal cost and/or court orders).

We charge % of revenue generated by the platform for our services. If platform generates no revenue, the service is free. We don't charge per lookup nor there is any scale limit (well, there are throughput limits, but they are quite far for most platforms [currently we can search around 1.1k hours of content every second]).

We cover video, sound recordings and compositions. Images are coming late this year and text sometimes next year.

Just a quick note on the moderation. The benefit of our structure is the moderation is "outsourced" back to law enforcement and gov agencies, like NCMEC and FBI in US, Bundespolizei in Germany, etc.). This means platforms don't have to hire people to moderate the covered content, because the liability is transferred back to the organizations that are create it in the first place.

[0] https://pex.com

codingclaws4y ago

That sounds interesting, but complicated.

We built our adminless Internet forum [1] for the same reason. It's splendid on the dev side because initially we built traditional moderation tools (they still exists in the Git), but then we deleted it all and it felt much simpler.

Still waiting to see if it will work from a moderation standpoint. The site's been live for almost a year with no issue yet.

Edit: It's a text only forum, which makes this a lot easier. I was thinking about allowing images but with a high per image fee.

[1] https://www.peachesnstink.com

viewOP4y ago

Thanks! What do you think is the most common moderation problem? NSFW media?

PaulHoule4y ago

First the pornography comes, then the child pornography. That can even be minors sexting which is not a criminal network but it can still get you in trouble.

Politics has a way of turning into violent threats, pictures of nooses, etc.

Then there is the spam, actually that comes before the pornography.

codingclaws4y ago

It's really just moderation in general. It's a long term, never ending issue. I think the big platforms employ thousands of moderators.

steve_adams_864y ago

We've been studying this problem for a product we're working on, and it seems like the depths of this hole get dark and very deep. We expect to need to hire someone to moderate early on.

As mentioned, there is child porn, various other forms of illegal porn, illegal violence, hatred, propaganda to incite violence, etc... Then of course any kind of content you might not approve of due to any personal or company policies, if relevant.

All of our material will be forced to be public so the user should have no expectation of privacy, but for many people this won't matter. Just look at Facebook; that's a public-facing service, and people upload atrocities to it constantly.

dredmorbius4y ago

The current discussion seems to fall under "Trust and Safety". I'm not aware of any specific guidance or organised thought, though I'd be really surprised if there weren't academic coursework or professional training beginning to appear.

You'd do well to look at established services. Craigslist's list of prohibited content, the T&C of Facebook, Twitter, Reddit, etc., are going to be useful.

Just off the top of my head:

Cyberstalking, pornography, child porn, piracy, malware, fraud, illicit goods (guns, drugs, black/grey market, stolen goods), intimidation, gangs, bullying, alcohol, tobacco, prescription medications, hoaxes, various ineefective / "alternative" products and remedies (which themselves run the gamut of legality, even defining this is at best difficult), advertising, advertising for protected or regulated sectors / goods (housing, employent, personal and professional services, beauty care, escorts, security services, licensed professional, ... As with the goods section, this rapidly gets complex), legal services / aid, political activities, fomenting revolutin / freedom fighters.

User-generate content is a massive concern.

One concept I'm seeing getting increased traction is a focus not on the quantity of posted content but the prevelance or level of access or views. Facebook and YouTube especially are increasingly discussing problematic content not in terms of posts or videos, but of views or presentations of those.

This ... starts making trade-offs in moderation much more viable, principally because there is an inverse logarithmic relationship between the number of items and the views: If n items gets n views, then 10*m items get n/10 views. Very roughly.

This means that you can set a goal in terms of the number of items viewed (and see what the maximum unmoderated prevalence will be), or target a specific prevalence and determine how many reviewers will be required.

For human moderators, the number of items reviewed per day seems to be in the 500--800 range. Note that 800 items/day in 8 hours is 100/hour, or 1.6 per minute, or 36 seconds per item. That's inclusive of breaks, overhead, and non-moderation tasks.

Moderation itself is a very psychologically loaded task. You'll either want to rotate people through it from other functions, or see a heck of a lot of staff turnover.

If anyone has greater insights from one of the current large UGC services (FB, Twitter, Instagram, Whatsapp, TikTok, Imgur, Reddit, etc.), I'd really like to know what current internal practices are.

Some of my previous work had some incidental exposure to this area (I was tasked with removing identified content, working on both our internal and external CDN provider to do so). After a couple of spot checks to see I was unlikely to be deleting content which wouldn't meet removal criteria, as in literally two, I decided I simply didn't want to take the risks of performing additional checks. My removal process turned out to be quite effective --- what the CDN provider's specs suggested might be a weeks-long process removed some millions of items over a weekend. That was on what is by current standards a very modest-sized social network.

I've written on this previously citing YouTube and Facebook sources here: https://joindiaspora.com/posts/f3617c90793101396840002590d8e...

JusticeJuice4y ago· 5 in thread

I think you just described https://cloudinary.com/

enlyth4y ago

We use Cloudinary at work and I would recommend it to anyone, I think it's an awesome service and the API is really easy to use.

They offer a pretty generous free tier for personal stuff as well, although I wish they had some plan that was between the free tier and the $99/month cheapest one, which is quite a steep increase from paying nothing.

This came to mind for me as well, as it's something I landed on in the past for managing UGC. The primary benefits being a decent api and administrative interface for moderation. It's certainly not perfect, but there's not a lot of competition in the space that I could find. I think most devs lean towards DIY.

michaelbuckbee4y ago

Or https://www.simplefileupload.com/ (good if you're on Heroku) or Uploadcare or any of the many other services that help do this kind of thing.

savrajsingh4y ago

Imagekit.io is good too

nathancahill4y ago

And filestack.com

new_guy4y ago· 5 in thread

> and noticed first-hand how needlessly complex and expensive existing services are

Can you give some examples?

Out of everything I would consider 'complex', handling media wouldn't even make the top 1,000 and services like AWS mean you can store petabytes for pennies.

eurasiantiger4y ago

Media is very difficult on the internet. It’s not enough to upload a video and share it, it needs to be transcoded into several different formats (h.264/mp4, vp-9/webm, h.265/hevc), multiple resolutions (from 2160p to 280p) and multiple bitrates (to support low/high bandwidth) as well as different framerates (60/30fps), all while retaining sync with an audio track that gets completely separate processing and compression.

None of this is a simple task, and you also need to serve the correct files to the correct clients - fun!

JamesSwift4y ago

> you also need to serve the correct files to the correct clients

Thats been the hardest part for me as a developer. I've had several extended debugging sessions that ended up being "the media is encoded incorrectly for this android device".

viewOP4y ago

> Can you give some examples?

Someone woke up one day with a $8,000 bill while encoding videos:

https://github.com/awslabs/video-on-demand-on-aws/issues/48

There's also this pretty architecture overview in the repo:

https://github.com/awslabs/video-on-demand-on-aws/blob/maste...

In my opinion, it's too complex and I'd prefer a more simple solution to get to market faster.

Imagine adding just a few lines of code to get started instead of setting up all these things on AWS, where you might end up paying up to $0.12/GB for outgoing bandwidth as well.

> In my opinion, it's too complex and I'd prefer a more simple solution to get to market faster.

Some folks absolutely need to build their own video encoding pipeline for one reason or another--but the happy path's pretty well-established for folks who just Need Some Video, IMO. At Mux (full disclosure: I'm on the DevEx team there), getting up and running with video is one API call to CreateAsset, followed by a HTTP PUT if your video file isn't already accessible via HTTP somewhere. IMO, hard to be simpler than that.

AWS is expensive, but you're right in that it doesn't have to be. Measuring like-for-like is difficult, and per-minute rather than per-GB pricing tends to make more sense for most developers, but for 1080p content Mux is usually around half the price of AWS IVS. Which, having done this before at a previous job--if you want to stay in all-AWS land, is a way better call for your sanity than trying to hack it out yourself.

laurent1234564y ago

If the user can upload anything they want, you usually need to have various measures in place to prevent abuse. First, ideally each user should have their own sub-domain (i.e. user_id.somedomain.com) so that if they upload malware your main domain doesn't end up on blacklists.

Secondly, you might need to scan the content somehow, again for malware and possibly allow other users to report it.

Of course there's the issue with copyrighted and illegal material - there should also be a way to report this, or detect it. I guess it means you need to be aware of and comply with the server country regulations which can be tricky in some countries.

Zealotux4y ago· 3 in thread

I'm using AWS' presigned URLs to let my users upload directly to an S3 bucket, with a lambda function to generate thumbnails, I keep track of the user's uploads through my app's database. This is as close as I imagine an "affordable API for audio, video, and images".

viewOP4y ago

Thanks for the feedback! How long did it take you to implement this and are you also encoding videos for adaptive streaming?

I had never worked with such API before so it took me roughly two days, the AWS documentation is abysmal, and I could've one that in just a few hours if only they cared about documenting their APIs better. My biggest mistake was going for a kind of presigned URL that was not giving me much control over the size of files sent (I don't find it anymore for some reason, looks like the documentation changed). Go with PresignedPost, it allows quite a lot of control and it's an amazing way to avoid load on your servers.

>are you also encoding videos for adaptive streaming?

Can't help you with that, I only worked with images, best of luck!

I have played around with AWS Elastic Transcoder: https://aws.amazon.com/elastictranscoder/

If video is of interest, I'd definitely take a look at that and see if it meets your needs. It's one of those specialized serverless offerings from AWS that you pay as you use.

ddorian434y ago· 3 in thread

A pain point that I'm considering is building a video hosting service but at 0.3x - 0.1x price point of existing ones. Hosting, encoding & streaming video is expensive. The "pennies" that another user mentions add up really fast.

I've worked in that space, the main problem is that your natural audience (small-time producers) tend to have small budgets and short-term needs. So you spend time finding them but they don't pay much and not for long.

CaptArmchair4y ago

This is also what prompted YouTube to derive revenue from advertising rather then charging content creators who want to upload video. In the case of YouTube, power laws have inverted the relationship with content creators: it's in YT's interest to not charge at all. A free to use / consume platform has turned YT into a commons which captures large audiences which drive advertising revenue.

Vimeo went into the opposite direction. They have a tiered pricing model charging for the use of the features and tools they offer. However, they pivoted away from catering to a large and diverse audience. Instead they focus on B2B communication. Vimeo is excellent for professional videographers or marketing agencies publishing video content.

The take away here is that the expense of processing and publishing video isn't going to lower dramatically in the short run. As hardware became ever more powerful over the past 2 decades, the demand for high-quality video has paced along in lockstep: 1080p, 4K,... So, the costs associated with hosting high quality content haven't dropped significantly in that regard. There's not much you can change about that.

What you do control is the business model you develop to cover the costs of hosting content. And that means finding a profitable market and asserting a good product/service fit up front, if you can.

Instead of building an app "to make it easier for developers to upload, process, and deliver media in their apps", OP ought to think beyond developers, and rather direct themselves to those who are either producing or consuming content.

Another way of thinking about this might be: Which problem really is getting solved? And whose problem really is it? Is it hosting audiovisual bitstreams? Is it processing? Is it just providing an at-the-ready API which allows users to just easily upload and embed audiovisual material everywhere without even requiring any technical knowledge?

The latter, by the way, already exists in several forms on the Web - e.g. https://oembed.com/ - which is already implemented in e.g. WordPress and supported by large social media platforms such as Instagram and YouTube.

viewOP4y ago

Maybe we could join forces? I'd be happy to chat with you or anyone interested. My email is admin at view.dev

butz4y ago· 2 in thread

You could sell moderation as additional service, either AI based (cheaper) or employ actual people. Otherwise, developers probably should be ones responsible for moderating media uploaded through their apps.

qwerty4561274y ago

> Otherwise, developers probably should be ones responsible...

How do you explain Google somebody else is responsible when they decide to nuke your app?

steve_adams_864y ago

I think it's important to recognize here that human moderation in many cases means exposing people to potentially traumatizing imagery. Gore, child porn, you name it. I think providing this service ethically could be very challenging.

antonyh4y ago· 1 in thread

The moderation of UGC is something that's killed off a number of my own ideas, at best 'parked' them. CSAM only goes so far, and frankly it's just too reactive - proactive defence is too costly for self-funding early stage startups such as myself.

Building the upload/delivery stream was easy, it's all the 'needless complexity' that adds value. You know, like privacy controls, access control, moderation, image formats and optimised delivery, indexing, search, tagging, etc.

I'll probably reimagine it and find better ways of hosting content but with my vision and UX. Maybe Cloudinary as folk have suggested, maybe some other SaaS DAM product with a solid track record - I doubt I'd trust an early stage MVP if my customers need to rely upon it. Too risky for my tastes.

Now if there was a moderation-as-a-service (MaaS?) that would have uses at the right pricepoint.

doh4y ago

Exactly why we built our Attribution Engine [0]. We focus not only on the technology, but the liabilities platforms gain from their operations.

[0] https://news.ycombinator.com/item?id=28279105

bethecloud4y ago· 1 in thread

Storj DCS is a good option for storing and delivering user-generated content. It is globally availble - so you don't need to worry about AZ replication, and it's 1/10th the price of Amazon S3.

We were paying $100,000 a month storing user-generated content, and now are paying about $10,000 a month after migrating

lamnk4y ago

Can you elaborate more on specifics, like what type of objects do you store on Storj, how's the performance?

tothrowaway4y ago

I would love a "Backblaze of user generated content". The existing players in the market are way too expensive (likely because they run off the cloud cartel and pass along their bandwidth tax). Basic image handling isn't too hard to deal with on your own, but video and audio is a huge pain. The uploading (which needs to be fault tolerant and resumable), encoding (which takes a long time), storage (which is large) and playback (which requires a half-dozen different formats) is all very annoying to deal with. So much so, for my SaaS products, I only allow my users to upload images!

I had the exact same idea as you, and shelled out a few hundred dollars for a domain from a squatter. My prototype basically reinvents fault tolerant resumable uploads (like tus.io). On the backend, it streams the file to Wasabi and Backblaze. That's as far as I got. Video/audio scares me, but I'll get to it eventually.

I really like the content moderation as a service (via AI, or humans) idea that others have mentioned.

How do you handle user-generated content in your apps?

Very similar to how I'd handle toxic waste. I'd touch it as little as possible, and ideally I'd like it to be someone else's problem.

The app store has requirements for dealing with user-generated content. The biggest pain points for me isn't with enabling users to upload content but instead with the moderation around it. One user might want to block another user and filter out any content they produce. Or we may need to manually review and delete content that a user has reported. That's the biggest pain point.

robbedpeter4y ago

Maybe ipfs and p2p is a good tool for this? Escalate through increasing degrees of content sharing and validation, from anonymous on up, and build templates off of existing ipfs hosts that already do moderation?

auspex4y ago

What are you all doing to secure the user content? To ensure the content isn’t exfiled

mux.com (YC backed) is doing stripe for video already. Their API is really to use.

j / k navigate · click thread line to collapse