question

2JK avatar image
0 Votes"
2JK asked JingmiaoXuMSFT-6985 edited

Content moderation screen text input

Hi.

Looking through the documentation and sample code of the Content Moderator API, I wanted to use the text_moderation.screen_text function to analyze some text and classify accordingly. However, it seems like the text_content input for the function must of type File object. But what I require is passing a python string object to it. I tried using io.StringIO(string) but it gave me a "memoryview: a bytes-like object is required, not 'str'" exception. I also tried encoding the string into a bytes object but it game me an error that it can't call .read() on the object.

Any idea if there's a workaround?

Thanks.

azure-cognitive-servicesazure-content-moderator
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

The reason for this error is that in Python 3, strings are Unicode, but when transmitting on the network, the data needs to be bytes instead. We can convert bytes to string using bytes class decode() instance method, So you need to decode the bytes object to produce a string. In Python 3 , the default encoding is "utf-8" , so you can use directly:

b"python byte to string".decode("utf-8")

Python makes a clear distinction between bytes and strings . Bytes objects contain raw data — a sequence of octets — whereas strings are Unicode sequences . Conversion between these two types is explicit: you encode a string to get bytes, specifying an encoding (which defaults to UTF-8); and you decode bytes to get a string. Clients of these functions should be aware that such conversions may fail, and should consider how failures are handled.



0 Votes 0 ·
2JK avatar image
0 Votes"
2JK answered romungi-MSFT commented

Hello. Thanks for the answer. I did try that but it gave me a "`memoryview: a bytes-like object is required, not 'str'`" error. I managed to make it work when I used io.BytesIO(text.encode()). It can use the .read() method on it so (for now) this looks resolved.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@2JK Thanks for posting. Please feel free to accept your response as answer.

0 Votes 0 ·
romungi-MSFT avatar image
0 Votes"
romungi-MSFT answered romungi-MSFT converted comment to answer

@2JK I have tried this scenario and below snippet seems to work. Could you please check if this works for you?

     from pprint import pprint
     import io
     from azure.cognitiveservices.vision.contentmoderator import ContentModeratorClient
     from msrest.authentication import CognitiveServicesCredentials
        
        
     CONTENT_MODERATOR_ENDPOINT = "<your_endpoint>"
     subscription_key = "your_key"
        
     text="Is this a grabage email abcdef@abcd.com, phone: 4255550111, IP: 255.255.255.255, 1234 Main Boulevard, Panapolis WA 96555. Crap is the profanity here. Is this information PII? phone 2065550111"
        
 client = ContentModeratorClient(
     endpoint=CONTENT_MODERATOR_ENDPOINT,
     credentials=CognitiveServicesCredentials(subscription_key)
 )   
     screen1 = client.text_moderation.screen_text(
         text_content=io.StringIO(text),
         language="eng",
         text_content_type="text/plain",
         autocorrect=True,
         pii=True,
         classify=True
     )
        
     pprint(screen1.as_dict())   




5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.